Alter data source - sql

Power BI can connect to various data sources and run SELECT queries.
Is it possible to run also other queries (INSERT INTO, UPDATE...)?
Now I need it for a postgresql database, but could use also for others in the future.

No, you can't run directly INSERT/UPDATE queries from Power BI. This isn't the idea of the tool. If you find you need it, then probably there is a major flaw in your design, or you are not using the right tool for this job. But there are few ways to workaround this (again, I'm not saying that you SHOULD do it). Usually this is done in a combination with custom written Power App, embedded in your report in Power Apps visual. The idea is that the app will write to the database, and will refresh your report after that (if needed).
You can start here and I will recommend you to look at this in-depth session - Writing back data to PowerBI from your reports.

The answer is No if I am very straight forward. PBI is a analysis platform for data. There are probably some advance way to do that but, this is not logical or good idea to think about manipulating data from report or from any BI tools. You can search answers from different blog where the same questions asked. For more details, you can check below links-
help link 1
help link 2

Related

Best way to save SQL versions while working on Tableau?

I am working using Tableau and have to write down multiple different SQL each time, while making new data sources.
I have to save all changes on SQL for every data source.
Currently I would paste the SQL on notepad and save them on separate folder in my computer, along with description of the changes.
Is there any better way to do this?
Assuming you have permission to create objects in the database, begin by creating database views, As #Nick.McDermaid commented.
Then, instead of using Custom SQL data source in Tableau, just connect to the View as if it were a table.
If you need to track the changes to these SQL views of your data, you will need to learn how to use source control for the .sql files that can be scripted from within SQL Server Management Studio:
Your company or school may have a preferred source control system already in use, in which case you should use that. If they don't, or if you are learning at home, then Git and Subversion are popular open source choices.
There are many courses available on learning platforms like Coursera that will teach you how to learn how to use those systems.
I had similar problem as you.
We ended up writing the queries in SQL Editor SQL Work bench (https://www.sql-workbench.eu/), then managed the code history and performed code peer-review (logic, error check, etc) in team shared space (like confluence).
The reasons we did that is
1) SQL queries are much easy to write on Work Bench
2) Code review is a must! You will find through implementing a review process more mistakes than you could ever think about
3) The shared space is just really convenient as it is accessible by everyone, and all errors are documented. After sometimes you get a lot of visible knowledge accumulated.
I also totally agree with Nick as this is one step to a reporting solution. But developing a whole reporting server is heavy, costly and takes time. Unless management are really convinced of the importance of developing a reporting solution, you may have to get a workaround with queries and Tableau (at least that was the case for us)
A little late to the party, but I would suggest you simply version the tableau workbook. The contents of the workbook are XML, so perfect for versioning using file based tools (Dropbox, One Drive, etc.) or source control (git, etc.). The workbooks themselves are usually quite small, so just make sure to keep the extract data separate if you use it.

Pentaho slow reporting

have been working on a projet about data integration, analysing and reporting using Pentaho. So at last, I needed to do some reporting using Pentaho report designer, weekly. The problem that is our data is so big (about 4M/day), so the reporting platform was too slow and we can't do any other queries from tables in use, until we kill the process. Is there any solution to this ? A reporting tool or platform that we can use instead of Pentaho tool without having to change the whole thing and get from the first ETL steps.
I presume you mean 4M records/day.
That’s not particularly big for reporting, but the devil is in the details. You may need to tweak your db structure and analyze the various queries at play.
As you describe it, there isn’t much info to go with to give you a more detailed answer.

Writing Reports with SSDT Visual Studio - Guidance

Broadly speaking, can someone tell me if I'm headed in the right direction?
I now know how to write SQL Queries pretty well.
I would like to start aggregating multiple queries onto one "form"/template (not sure if that's the correct terminology).
I have access to lots of clean data in the form of Excel Files.
I should be able to load the excel files into Visual Studio and then write reports that refer to those excel files as databases, am I right?
I haven't been able to find a great SSDT tutorial yet, but I'll keep looking. Thanks for any feedback.
First off, I apologize that I'm writing a bit of a novel here. My understanding of your question is that you're looking for architectural guidance on the best way to go, and that's not a quick answer.
If, however, I've misinterpreted your intent and you are actually just looking for how to code up an excel file as a database, there is already a lot of articles online that you can google.
Regarding your architectural question...it is really going to come down to choosing the best trade-offs for what you're building. I will give you some pointers that I have learned and hopefully it is helpful to you and others in the community.
I would be very hard pressed to advise that you use an excel file as a database.
While it might seem like the most straight forward solution, the trade-offs here are very costly in debugging file locking issues and dealing with excel specific errors, it becomes a death by a thousand cuts. It is certainly possible, but this is a trap that I personally fell into early in my career.
Here's is a link to some descriptions of the problems that you'd have with an excel file database and here is a 2nd link.
To paraphrase your question, it sounds like you're developing a personal ETL application for improving your productivity and your company's metrics. Spreadsheets come into your e-mail inbox and transformed versions of the spreadsheets go out of your e-mail outbox. You are wanting to look at the departments' data from a historical and comparative perspective. I have done this many times in the past as well and it is a very reasonable goal.
The best way that I have found to do this is to use a SQL Server database. You can start this out in phases of minimal viable product to do this in small easy chunks.
Phase I:
Download and install SQL Server 2016 Express free. Make sure
to install localdb when you install SQL Server 2016. See the localdb
instructions for more information.
Create the localdb instance on the command line.
Connect to the new localdb instance in SQL Server
Create a new Database that you'll use for importing the data. Give it a name like "ReportData"
Import the excel files received from the variety of businesses into the new database. This stackoverflow answer gives a great description of how to do it. Here is an alternate example.
If you get any error messages about drivers you may need to download the correct drivers.
Develop your SQL queries that you want to use. For simplicity, I'm just showing a basic select statement here, but you can build some sophisticated SQL queries for aggregating the data in this step.
Export the data from the excel file into a CSV file or an excel file. You do this by right clicking in the "Results" area and selecting "Save Results As..."
Manually copy and paste the resulting values into the excel templates that you would like.
Note step 9 will be automated soon, but it is better for now to understand your domain objects and be thinking about the business logic that you're building in a quick iterative manner.
Phase II:
Create a new Console application in Visual Studio that will transform the data from the database into an Excel file output. The most powerful way to do this is to use EPPlus. Here is a detailed explanation on how to do this.
Note, when you run the source code from the detailed explanation link, you need to change the output path first, for example to c:\temp. Note also that there are plenty of other Excel spreadsheet helper packages out there, so feel free to look around at other packages as well. EPPlus is simply one that I have been successful with in my projects. This Console application will be querying your SQL Server database using the queries that you built in step 7 above.
Phase III:
In time, you many find that co-workers and managers within your company want to start accessing your data directly through a web page...
At a high level, the steps you would take are:
Backup the database and restore it onto a server.
Implement a simple MVC application
Perhaps even build web pages to allow users to import excel so that they don't need to e-mail them to you any longer.
An additional note, there are Enterprise level ETL and reporting tools out there as well, such as SSIS/SSRS, etc that you could look into if you're looking for a more sophisticated tool set, but I didn't get that impression with your question.
I hope that this answer helps and isn't too long winded. Please let me know if any of the steps are unclear, I know it's a lot of information in one post.

Is there a way to let non-technical individuals utilize BigQuery reports?

I want to have an access port for non-tech savvy individuals in which they could make reports of their own without needing to know SQL what-so-ever.
It would be best if I could create custom fields of myself, and then just let the users in the access port pick and choose whichever they like with a custom date range.
I've explored the options Google Data Studio offers, but it looks to me like it mostly puts an emphasis on data visualization.
In addition, my attempts to make custom queries with it were not successful, since the platform is rigid in terms of deciding which field is a metric and which is a dimension (and it does so inaccurately). This makes it hard to query reports as you normally would using BigQuery, which doesn't have these somewhat arbitrary limitations.
Perhaps I've misunderstood something about the platform due to my limited experience with it, but it looks like Data Studio isn't going to fit the bill for me.
EDIT: In addition, the platform should have a way of exporting said reports as CSV files, a feature that Data Studio doesn't have as far as I know.
It would be great to receive suggestions for a different platform which would better fit my needs, or even suggestions on how to make better use of Data Studio.
Have you looked at using a tool like redash (https://redash.io)? Assuming your GA360 data is in BigQuery you can connect redash to BQ. Then you can author queries and visualize.
You can also use the Google Could SDK to connect to BQ and run custom queries to generate new tables in BQ based on the GA360 session data. Then use redash, or any tool, to report/visualize.

Proper way to move data to a data warehouse

I am in the middle of a small project aimed to eventually create a data warehouse. I am currently moving data from a flat file system and two SQL Server databases. The project started in C# to automate the processing of data from the flat file system. Along with this, the project executes stored procedures to bring data from other databases. They are accessing the data from other databases using linked servers.
I am wondering if this is incorrect as even though it does get the job done, there may better approach? The other way I have thought about this is to use the app to pull data from each DB then push it to the data warehouse, but I am not sure about performance. Is there another way? Any path that I can look into is appreciated.
'proper' is a pretty relative term. I have seen a series of stored procedures, SSIS (microsoft), and third party tools. THey each have some advantages
Stored procedures
Using a job to schedule a series of stored procedures that insert rows from one server to the next works. I find sql developers more likely to take this path...it's flexible in design and good SQL programmers can accomplish nearly anything in here. That said, it is exceedingly difficult to support / troubleshoot / maintain / alter (especially if the initial developer(s) are no longer with the company). There is usually very poor error handling here
SSIS and other tools such as pentaho or data stage or ...google search it, theres a few.
This gives a more graphical design interface, although I've seen SSIS packages that simply called a stored procedures in order that may as well just been a job. These tools are really what you make of them. They give very easy to see work flows and are substantially robust when it comes to error handling and troubleshooting ability (trust me, every ETL process is going to have a few bad days and you'll be very happy for any logging you have to identify what you want). I find configuring a servers resources (multiple processors for example) is significantly easier with these tools. They all come with quite the learning curve though.
I find SQL developers are very much inclined to use the stored procedure route while people from a DBA background are generally more inclined to use the tools. If you're investing the time into it, the SSIS or equivlent tool is a better way to go from the future of your company standpoint, though takes a bit more to implement.
In choosing what to use you need to consider the following factors:
How much data are we talking about moving and how quickly does it need to be moved. There is s huge difference between using a linked server to move45,000 records and using it to move 100,000,000 records. Consider alo the expected growth of the data set to be moved over time. A process taht is fine in the early stages may chocke and die once you get more records. Tools like SSIS are much faster once you know how to use them which brings us to point 2.
How much development time do you have and what tools does the developer and the person who will maintain the import over time know? SSIS for instance is a complex tool, it can take a long time to feel comfortable with it.
How much data cleaning and transforming do you need to do? What kind of error trapping and exception processing do you need, what kind of logging will you need? The more complex the process, the more likely you will need to bite the bullet and learn an ETL specific tool.
Even there is a few answers, and I agree with two of them, I have to give my subjective opinion about the wider picture.
I am in the middle of a small project aimed to eventually create a data warehouse.
Question name perfectly suits to your question description. It could be very helpful to future readers. So, your project should create data warehouse. However it's small, learn to develop projects with scalability. Always!
In that point of view, search and study about how data warehouse project should look like. And develop each step.
Custom software vs Stored Procedures (Linked DBs) vs ETL
Custom software (in this case your C# project) should be used in two cases:
Medium scale projects where budget ETL cannot do everything
You're working for Enterprise level IT company, so developing your solution is cheaper and more manageable
And perhaps you think for tiny straight-forward projects. But NO, because those projects can grow and very quick outgrow your solution (new tables, new sources, changing ERP or CRM, ect).
If you're using just SQL Server, if you no need for data cleansing, if you no need for data profiling, if you no need for external data, Stored Procedures are OK. But, a lot of 'ifs' is here. And again, you're loosing scalability (your managment what's to add some data from Google Spreadsheet they internly use, KPI targets for example).
ETL tools are one native step in data warehouse development. In begining, there could be few table copy operation, or some SQL's, one source, one target. As far as your project is growing, you can adding new transformations.
SSIS is perhaps best as you're using SQL Server, but there is some good, free tools.