Issue in Partition in SSAS tabular model with DirectQuery Mode - ssas

I am trying to create a sample partition to a tabular model database in DirectQuery mode, and I got the following error after setting the filter and trying to import:
"Failed to save modifications to the server: Error returned: 'A table that has partitions using DirectQuery mode and a Full DataView can have only one partition in DirectQuery mode. In this mode, table 'FactInternetSales' has invalid partition settings. You might need to merge or delete partitions so that there is only one partition in DirectQuery mode with Full Data View."
Would anyone please help me understand the issue. Thank you

A DirectQuery model is one which doesn’t cache the data in the model. Instead as the DirectQuery model is queried it in turn generates queries against the backend SQL data source at query time. This is compared to an Import model where the source data is imported ahead of time and compressed in memory for snappy query performance. Import models require periodic refreshes so data won’t get stale. DirectQuery models don’t require refresh since they always reflect what’s in the source system.
The error you got is self explanatory. DirectQuery models should only have one partition per table and that partition’s query should cover 100% of the date range your model should cover for that particular table. So check FactInternetSales partitions and remove all but one partition and remove the WHERE clause from the partition query.

Related

Kylin Cardinality Calculation

I have a Kylin Cube built on some data that is partitioned on date. Whenever a new date's data is added into hive, Kylin is not able to detect it. Is this normal behaviour?
Currently I am manually reloading the table in data-sources tab. This calls for a recalculation of cardinality. The data is too big and calculation of cardinality is taking so long.
Can anyone help me? Am I missing anything?
When some data is added into the hive external table, we must run :
MSCK REPAIR TABLE table_name;
Kylin will then be able to read the new partitions.

Visualization Using Tableau

I am new to Tableau, and having performance issues and need some help. I have a hive query result in Azure Blob Storage named as part-00000.
The issue having this performance is I want to execute the custom query in Tableau and generates the graphical reports at Tableau.
So can I do this? How ?
I have 7.0 M Data in Hive table.
you can find custom query in data source connection check linked image
You might want to consider creating an extract instead of a live connection. Additional considerations would include hiding unused fields and using filters at the data source level to limit data as per requirement.

PDI or mysqldump to extract data without blocking the database nor getting inconsistent data?

I have an ETL process that will run periodically. I was using kettle (PDI) to extract the data from the source database and copy it to a stage database. For this I use several transformations with table input and table output steps. However, I think I could get inconsistent data if the source database is modified during the process, since this way I don't get a snapshot of the data. Furthermore, I don't know if the source database would be blocked. This would be a problem if the extraction takes some minutes (and it will take them). The advantage of PDI is that I can select only the necessary columns and use timestamps to get only the new data.
By the other hand, I think mysqldump with --single-transaction allows me to get the data in a consistent way and don't block the source database (all tables are innodb). The disadventage is that I would get innecessary data.
Can I use PDI, or I need mysqldump?
PD: I need to read specific tables from specific databases, so I think xtrabackup it's not a good option.
However, I think I could get inconsistent data if the source database is modified during the process, since this way I don't get a snapshot of the data
I think "Table Input" step doesn't take into account any modifications that are happening when you are reading. Try a simple experiment:
Take a .ktr file with a single table input and table output. Try loading the data into the target table. While in the middle of data load, insert few records in the source database. You will find that those records are not read into the target table. (note i tried with postgresql db and the number of rows read is : 1000000)
Now for your question, i suggest you using PDI since it gives you more control on the data in terms of versioning, sequences, SCDs and all the DWBI related activities. PDI makes it easier to load to the stage env. rather than simply dumping the entire tables.
Hope it helps :)
Interesting point. If you do all the table inputs in one transformation then at least they all start at same time but whilst likely to be consistent it's not guaranteed.
There is no reason you can't use pdi to orchestrate the process AND use mysql dump. In fact for bulk insert or extract it's nearly always better to use the vendor provided tools.

Do I need to import the full data in SSAS tabular mode from my SQL database in DirectQuery mode?

I'm trying to build a Analysis service tabular project in tabular mode and want to use DirectQuery mode so that the queries are executed at the backend.
When I click on the model, and select import data from source, I see option to retrieve the full data. Now I have a billion rows in my fact table and I dont want to import the entire data when building the model. Am I missing something here? DirectQuery in tabular , from what I understand, is similar to ROLAP storage mode in Multi Dimensional world, where there is no need for the process step and queries get real time data. So what's the point of importing all the data when building the model?
If it is just to get the schema of the tables, why not just query the DB for schema of tables instead of importing the full data? Can someone explain?
When you go through the Import From Data Source wizard, select Write a query that will specify the data to import. Write a query that imports only one row, SELECT TOP 1 * FROM <table_name>. That will import just one row and the schema.

How to create a Temporary Table using (Select * into ##temp from table) syntax(For MS SQL) using Pentaho data integration

When I am using the above syntax in "Execute row script" step...it is showing success but the temporary table is not getting created. Plz help me out in this.
Yes, the behavior you're seeing is exactly what I would expect. It works fine from the TSQL prompt, throws no error in the transform, but the table is not there after transform completes.
The problem here is the execution model of PDI transforms. When a transform is run, each step gets its own thread of execution. At startup, any step that needs a DB connection is given its own unique connection. After processing finishes, all steps disconnect from the DB. This includes the connection that defined the temp table. Once that happens (the defining connection goes out of scope), the temp table vanishes.
Note, that this means in a transform (as opposed to a Job), you cannot assume a specific order of completion of anything (without Blocking Steps).
We still don't have many specifics about what you're trying to do with this temp table and how you're using it's data, but I suspect you want its contents to persist outside your transform. In that case, you have some options, but a global temp table like this simply won't work.
Options that come to mind:
Convert temp table to a permanent table. This is the simplest
solution; you're basically making a staging table, loading it with a
Table Output step (or whatever), and then reading it with Table
Input steps in other transforms.
Write table contents to a temp file with something like a Text File
Output or Serialze to File step, then reading it back in from the
other transforms.
Store rows in memory. This involves wrapping your transforms in a
Job, and using the Copy Rows to Results and Get Rows from Results steps.
Each of these approaches has its own pros and cons. For example, storing rows in memory will be faster than writing to disk or network, but memory may be limited.
Another step it sounds like you might need depending on what you're doing is the ETL Metadata Injection step. This step allows you in many cases to dynamically move the metadata from one transform to another. See the docs for descriptions of how each of these work.
If you'd like further assistance here, or I've made a wrong assumption, please edit your question and add as much detail as you can.