Snowflake View nested calculation - sql

I have to build a view by fetching data from 7-8 tables and then there are field which are calculated from other calculated fields. For example first calculation is if(indicator=‘H’, amount*20, amount) as deliAmt. And then
If(isnull(deliAmt),0 else deliAmt)
This is just an example but for this view i have 5-6 such calculations required.
Also the final view has around 7-8 main tables and other tables for these fetching columns for these calculations. In tolal there will be 57 columns finally.
Please guide what is the best approach to implement this.

To write a view that selects data from 7-8 tables, write the SQL the select from 7-8 table and put it "in a view".
But the other part of you question of how to do IF like logic is to use in Snowflake the IFF operator thus your example if(indicator=‘H’, amount*20, amount) as deliAmt
would be written
IFF(indicator=‘H’, amount*20, amount) as deliAmt
and If(isnull(deliAmt),0 else deliAmt) would be:
IFF(isnull(deliAmt), 0, deliAmt)
of which can also be done via ZEROIFNULL like:
ZEROIFNULL(deliAmt)

Related

How do I select attributes from another selection of elements

I am using Excel's Power Query in order to test a SQL query that I am eventually going to use in order to make a pivot table that stays updated with the database. The database is accessed through an ODBC.
The problem is not related to Power Query itself but simply the SQL request.
Here I am trying to select all bills from the "facturation" (French database) table that are from the current year (2021). I am naming this selected data FACTURES_ANEE_COURANTE.
Then I want to also select some attributes of those items from 2021 in order to display them in the pivot table, but only on the selection that I just made in order to only select (and show) bills from the current year.
select * as FACTURES_ANNEE_COURANTE
from facturation
where year(date_fact)=2021 limit 3, select date_fact from FACTURES_ANNEE_COURANTE
I only have very basic knowledge of SQL and therefore this does not seem to work, the second part of my request that is (the first one works). I'm trying to do this in order to be able to show these specific attributes in the pivot table. What's the proper way to select attributes only from my first selection of elements from my table facturation?
Thank you for your help.
A major advantage of Power Query is being able to generate complex logic without needing to be able to code in SQL. So I would abandon writing hand coded SQL - there's no need.
Before PQ came out I had 2 decades of experience writing complex SQL. After PQ came out I've written almost none - the SQL code generated by PQ is good enough, you can easily add complex transformations that are hard/impossible in SQL, and overall developing and debugging is 10x easier.
For your scenario, I would build a PQ query just using the navigation to select your facturation table. Then I would use the PQ UI to Filter (instead of a SQL where clause) and Choose Columns (to restrict the columns returned).
Whatever other transformations you need are likely met using a button in the PQ UI.

How to populate all possible combination of values in columns, using Spark/normal SQL

I have a scenario, where my original dataset looks like below
Data:
Country,Commodity,Year,Type,Amount
US,Vegetable,2010,Harvested,2.44
US,Vegetable,2010,Yield,15.8
US,Vegetable,2010,Production,6.48
US,Vegetable,2011,Harvested,6
US,Vegetable,2011,Yield,18
US,Vegetable,2011,Production,3
Argentina,Vegetable,2010,Harvested,15.2
Argentina,Vegetable,2010,Yield,40.5
Argentina,Vegetable,2010,Production,2.66
Argentina,Vegetable,2011,Harvested,15.2
Argentina,Vegetable,2011,Yield,40.5
Argentina,Vegetable,2011,Production,2.66
Bhutan,Vegetable,2010,Harvested,7
Bhutan,Vegetable,2010,Yield,35
Bhutan,Vegetable,2010,Production,5
Bhutan,Vegetable,2011,Harvested,2
Bhutan,Vegetable,2011,Yield,6
Bhutan,Vegetable,2011,Production,3
Image of the above csv:
Now there is a very small country lookup table which has all possible countries the source data can come with, listed. PFB:
I want to have the output data's number of columns always fixed (this is to ensure the reporting/visualization tool doesn't get dynamic number columns with every day's new source data ingestions depending on the varying distinct number of countries present).
So, I've to somehow join the source data with the country_lookup csv and populate all those columns with default value as F. Every country column would be binary with T or F being the possible values.
The original dataset from the above has to be converted into below:
Data (I've kept the Amount field unsolved for column Type having Derived Yield as is, rather than calculating them below for a better understanding and for you to match with the formulae):
Country,Commodity,Year,Type,Amount,US,Argentina,Bhutan,India,Nepal,Bangladesh
US,Vegetable,2010,Harvested,2.44,T,F,F,F,F,F
US,Vegetable,2010,Yield,15.8,T,F,F,F,F,F
US,Vegetable,2010,Production,6.48,T,F,F,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
US,Vegetable,2011,Harvested,6,T,F,F,F,F,F
US,Vegetable,2011,Yield,18,T,F,F,F,F,F
US,Vegetable,2011,Production,3,T,F,F,F,F,F
US,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F
US,Vegetable,2011,Derived Yield,(6+2)/(3+3),T,F,T,F,F,F
US,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
Argentina,Vegetable,2010,Harvested,15.2,F,T,F,F,F,F
Argentina,Vegetable,2010,Yield,40.5,F,T,F,F,F,F
Argentina,Vegetable,2010,Production,2.66,F,T,F,F,F,F
Argentina,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F
Argentina,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F
Argentina,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
Argentina,Vegetable,2011,Harvested,10,F,T,F,F,F,F
Argentina,Vegetable,2011,Yield,90,F,T,F,F,F,F
Argentina,Vegetable,2011,Production,9,F,T,F,F,F,F
Argentina,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F
Argentina,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F
Argentina,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
Bhutan,Vegetable,2010,Harvested,7,F,F,T,F,F,F
Bhutan,Vegetable,2010,Yield,35,F,F,T,F,F,F
Bhutan,Vegetable,2010,Production,5,F,F,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
Bhutan,Vegetable,2011,Harvested,2,F,F,T,F,F,F
Bhutan,Vegetable,2011,Yield,6,F,F,T,F,F,F
Bhutan,Vegetable,2011,Production,3,F,F,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
The image of the above expected output data for a structured look at it:
Part 1 -
Part 2 -
Formulae for populating Amount Field for Derived Type:
Derived Amount = Sum of Harvested of all countries with T (True) grouped by Year and Commodity columns divided by Sum of Production of all countries with T (True)grouped by Year and Commodity columns.
So, the target is to have a combination of all the countries from source and calculate the sum of respective Harvested and Production values which then has to be divided. The commodity can be more than one in the actual scenario for any given country, but that should not bother as the summation of amount happens on grouped commodity and year.
Note: The users in the frontend can select any combination of countries. The sole purpose of doing it in the backend rather than dynamically doing it in the frontend is because AWS QuickSight (our visualisation tool), even though can populate sum on selected column filters but doesn't yet support calculation on those derived summed fields. Hence, the entire calculation of all combination of countries has to be pre-populated (very naive approach) in order to make it available in report on dynamic users selection of countries.
Also if you've any better approach (than the above naive approach mentioned in note) to solve this problem, you are most welcome to guide me. I've also posted a question on the same problem without writing my expected approach for experts to show me the path on how we can solve this kind of a problem better than this naive approach. If you want to help solve it with some other technique, you're most welcome, here is the link to that question.
Any help shall be greatly acknowledged.

How do you "unpack" a query that queries views in Google BigQuery?

Suppose that I have a view in BigQuery, e.g. [views.myview] defined as follows:
SELECT
Id AS Id,
MAX(Time) AS MostRecentTime
FROM
[dataset.mytable]
GROUP BY
Id
And then another query that queries that view:
SELECT
*
FROM
[dataset.mytable] tbl
JOIN [views.myview] view ON tbl.Time = mview.MostRecentTime
Is there a way to automatically generate a query where the [views.myview] in the second query is replaced with the query that generates it - basically "unpacking" the views so you have just one query that queries tables directly?
(The underlying problem: I have a query which queries many different views, including several layers of views-querying-other-views, and I want to put this query in my application. I don't want a user to be able to mess with the results of the query by changing the definition of one of the views, so I want to put the whole query in a fixed form in the application.)
This is not possible to do 'automatically'. You could try writing some script or code to do this through the BigQuery apis - https://cloud.google.com/bigquery/docs/managing-views

SSRS - Report doesn't loads via Report Builder, while it does via SQL

this is my first question here.
I struggled days and days trying to find a solution everywhere with no success.
Basically I have a standard stored procedure pulling out a report dataset in a few seconds (5-6 seconds).
It aggregates (GROUPING BY and SUMMING) 23000 rows.
Indeed, my final dataset comes out with 4 rows and 33 columns executing, as said, in 5-6 seconds.
Unfortunately, while trying to load it via ReportBuilder, it loads endlessly (querying SQL Server, the StoredProcedure remains stuck in a RUNNING status forever).
Everything on ReportBuilder (DB Accesses, Dataset, Parameters, Matrix....) is right configured: I was indeed able to load it until I added a few additional (4) fields.
The SQL dataset is basically something like:
PARAMETERS DECLARATION
SELECT
FIELDS
FROM
(SELECT
FIELD A
SUMS
FROM
TABLE
JOIN TABLES
WHERE
PARAMETERS MATCHING
GROUP BY A
) AS B
ORDER BY FIELD
An "external layer" SELECT was needed to make some calculations on some FIELDS, also in some cases using some PARAMETERS.
That's it.
I use to work with huge datasets, sometimes pulling out 30,000 rows with 110 fields, but if something loads via SQL it also does always via ReportBuilder: this is the very first time it behaves in this different way.
So I'm asking if there are some strange SSRS/ReportBuilder limitations I never faced in my experience.
Any help would be really really appreciated!
Thanks in advance to everyone who'll spend time :)

SQLite view across multiple databases. Is this okay? Is there a better way?

Using SQlite I have a large database split into years:
DB_2006_thru_2007.sq3
DB_2008_thru_2009.sq3
DB_current.sq3
They all have a single table call hist_tbl with two columns (key, data).
The requirements are:
1. to be able to access all the data at once.
2. inserts only go to the current version.
3. the data will continue to be split as time goes on.
4. access is through a single program that has exclusive access.
5. the program can accept some setup SQL but needs to run the same when accessing one database or multiple databases.
To view them cohesively I do the following (really in a program but command line shown here):
sqlite3 DB_current.sq3
attach database 'DB_2006_thru_2007.sq3' as hist1;
attach database 'DB_2008_thru_2009.sq3' as hist2;
create temp view hist_tbl as
select * from hist1.hist_tbl union
select * from hist2.hist_tbl union
select * from main.hist_tbl;
There is now a temp.hist_tbl (view) and a main.hist_tbl (table).
When I select without qualifying the table I get the data thru the view.
This is desirable since I can use my canned sql queries against either the joined view or the individual databases depending on how I setup. Additionally I can always insert into main.hist_tbl.
Question 1: What are the downsides?
Question 2: Is there a better way?
Thanks in advance.
Question 1: What are the downsides?
You have to update the view EVERY. FISCAL. year.
Question 2: Is there a better way?
Add a date column so you can search for things within a given timespan, like a fiscal year.