Tableau count values after a GROUP BY in SQL

Tableau count values after a GROUP BY in SQL - sql

I'm using Tableau to show some schools data.
My data structure gives a table that has all de school classes in the country. The thing is I need to count, for example, how many schools has Primary and Preschool (both).
A simplified version of my table should look like this:
In that table, if I want to know the number needed in the example, the result should be 1, because in only one school exists both Primary and Preschool.
I want to have a multiple filter in Tableau that gives me that information.
I was thinking in the SQL query that should be made and it needs a GROUP BY statement. An example of the consult is here in a fiddle: Database example query
In the SQL query I group by id all the schools that meet either one of the conditions inside de IN(...) and then count how many of them meet both (c=2).
Is there a way to do something like this in Tableau? Either using groups or sets, using advanced filters or programming a RAW SQL calculated fiel?
Thanks!
Dubafek
PS: I add a link to my question in Tableu's forum because you can download my testing workbook there: Tableu's forum question

I've solved the issue using LODs (specifically INCLUDE and EXCLUDE statements).
I created two calculated fields having the aggregation I needed:
Then I made a calculated field that leaves only the School IDs that matches the number of types they have (according with the filtering) with the number of types selected in the multiple filter (both of the fields shown above):
Finally, I used COUNTD([Condition]) to display the amounts of schools matching with at least the School types selected.
Hope this helps someone with similar issue.
PS: If someone wants the Workbook with the solution I've uploaded it in an answer in the Tableau Forum

Related

How to populate all possible combination of values in columns, using Spark/normal SQL

I have a scenario, where my original dataset looks like below
Data:
Country,Commodity,Year,Type,Amount
US,Vegetable,2010,Harvested,2.44
US,Vegetable,2010,Yield,15.8
US,Vegetable,2010,Production,6.48
US,Vegetable,2011,Harvested,6
US,Vegetable,2011,Yield,18
US,Vegetable,2011,Production,3
Argentina,Vegetable,2010,Harvested,15.2
Argentina,Vegetable,2010,Yield,40.5
Argentina,Vegetable,2010,Production,2.66
Argentina,Vegetable,2011,Harvested,15.2
Argentina,Vegetable,2011,Yield,40.5
Argentina,Vegetable,2011,Production,2.66
Bhutan,Vegetable,2010,Harvested,7
Bhutan,Vegetable,2010,Yield,35
Bhutan,Vegetable,2010,Production,5
Bhutan,Vegetable,2011,Harvested,2
Bhutan,Vegetable,2011,Yield,6
Bhutan,Vegetable,2011,Production,3
Image of the above csv:
Now there is a very small country lookup table which has all possible countries the source data can come with, listed. PFB:
I want to have the output data's number of columns always fixed (this is to ensure the reporting/visualization tool doesn't get dynamic number columns with every day's new source data ingestions depending on the varying distinct number of countries present).
So, I've to somehow join the source data with the country_lookup csv and populate all those columns with default value as F. Every country column would be binary with T or F being the possible values.
The original dataset from the above has to be converted into below:
Data (I've kept the Amount field unsolved for column Type having Derived Yield as is, rather than calculating them below for a better understanding and for you to match with the formulae):
Country,Commodity,Year,Type,Amount,US,Argentina,Bhutan,India,Nepal,Bangladesh
US,Vegetable,2010,Harvested,2.44,T,F,F,F,F,F
US,Vegetable,2010,Yield,15.8,T,F,F,F,F,F
US,Vegetable,2010,Production,6.48,T,F,F,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
US,Vegetable,2011,Harvested,6,T,F,F,F,F,F
US,Vegetable,2011,Yield,18,T,F,F,F,F,F
US,Vegetable,2011,Production,3,T,F,F,F,F,F
US,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F
US,Vegetable,2011,Derived Yield,(6+2)/(3+3),T,F,T,F,F,F
US,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
Argentina,Vegetable,2010,Harvested,15.2,F,T,F,F,F,F
Argentina,Vegetable,2010,Yield,40.5,F,T,F,F,F,F
Argentina,Vegetable,2010,Production,2.66,F,T,F,F,F,F
Argentina,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F
Argentina,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F
Argentina,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
Argentina,Vegetable,2011,Harvested,10,F,T,F,F,F,F
Argentina,Vegetable,2011,Yield,90,F,T,F,F,F,F
Argentina,Vegetable,2011,Production,9,F,T,F,F,F,F
Argentina,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F
Argentina,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F
Argentina,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
Bhutan,Vegetable,2010,Harvested,7,F,F,T,F,F,F
Bhutan,Vegetable,2010,Yield,35,F,F,T,F,F,F
Bhutan,Vegetable,2010,Production,5,F,F,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
Bhutan,Vegetable,2011,Harvested,2,F,F,T,F,F,F
Bhutan,Vegetable,2011,Yield,6,F,F,T,F,F,F
Bhutan,Vegetable,2011,Production,3,F,F,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
The image of the above expected output data for a structured look at it:
Part 1 -
Part 2 -
Formulae for populating Amount Field for Derived Type:
Derived Amount = Sum of Harvested of all countries with T (True) grouped by Year and Commodity columns divided by Sum of Production of all countries with T (True)grouped by Year and Commodity columns.
So, the target is to have a combination of all the countries from source and calculate the sum of respective Harvested and Production values which then has to be divided. The commodity can be more than one in the actual scenario for any given country, but that should not bother as the summation of amount happens on grouped commodity and year.
Note: The users in the frontend can select any combination of countries. The sole purpose of doing it in the backend rather than dynamically doing it in the frontend is because AWS QuickSight (our visualisation tool), even though can populate sum on selected column filters but doesn't yet support calculation on those derived summed fields. Hence, the entire calculation of all combination of countries has to be pre-populated (very naive approach) in order to make it available in report on dynamic users selection of countries.
Also if you've any better approach (than the above naive approach mentioned in note) to solve this problem, you are most welcome to guide me. I've also posted a question on the same problem without writing my expected approach for experts to show me the path on how we can solve this kind of a problem better than this naive approach. If you want to help solve it with some other technique, you're most welcome, here is the link to that question.
Any help shall be greatly acknowledged.

Using MS Access to obtain data across linked tables

I'm new to MS Access and am trying to speed up a data gathering process that is taking forever in Powershell. In Powershell I have 10 or so web API calls to get data and each comes back as an object with multiple properties (fields.) Each set of data has related fields to 1 or more of the other sets of data. Getting the data is very quick but piping an array of objects to where-object to select-object takes over an hour and there's really not that much data. Each object contains 500-1500 "records" and 5 to 10 "fields" so I thought why not export that data and use something that's intended to search through data to do the job? I exported each object as a separate .CSV file. So enter MS Access..
I imported each of the CSV's as a separate table (easy enough.) I'm going to simplify this down for this example to the following 3 tables:
[Tables]https://i.stack.imgur.com/UCH1F.jpg
Every table has fields that relate it over to other tables. Pretty much there's some sort of Id field in every table that is related to another Id field in a different table that I need to pull a field called "name" from. I'm trying to follow the bread crumbs from the Player name to it's Network name to it's Application name, to it's Layout name, etc... I want to build a query that I would eventually just be able to export as an Excel file. I also would prefer to just write out the SQL unless it's really easier to to understand the visual query builder. I'm looking to build a sheet with the following information:
Player's Name would include all names from the Players table and getting just that data makes sense to me. SELECT Name AS PlayerName FROM Players Everything else, not so much. I feel like this will end up being some mega query as I get deeper into related table after related table. In Excel, it would be straightforward using Vlookups across tabs but that doesn't seem to be the best approach. Given the info above, I'm trying to achieve the following output:
Result table
Any help with strategy and syntax greatly appreciated!

You're looking for the JOIN clause.
SELECT
Players.Name PlayerName, Networks.Name PlayerNetwork, Applications.Name ApplicationName
FROM
Players
LEFT OUTER JOIN
Networks
ON
Networks.ID = Players.NetworkId
LEFT OUTER JOIN
Applications
ON
Applications.Id = Players.ApplicationID

SQL Server 2012 Managment Studio Basic SELECT queries

I am new to SQL Server. I have been assigned to do some simple queries to start off, then eventually move on to more complex queries.
I have spent a lot of time on this website: http://www.w3schools.com and I understand it, I think, but then when I go back to my company's database, I find myself searching from many, many, different tables with different information.
For example, a table would say [Acct_Name] and the query comes back with not the correct account name (s) that I need. Any advice that you think might help me? Thank you.

It sounds like you are looking to limit your results to specific accounts. There are many ways to go about this, so no one will be able to give you a all encompassing answer but if you are looking to just pull a single account
SELECT * FROM (your table name) WHERE Acct_Name = 'the account name'
The * means you are selecting all columns in the table and your WHERE clause is where you set your search conditionals, like account name or by account ID. If you had a account creation date, you could get all accounts created on or before a date like this
SELECT * FROM (your table name) WHERE Created < '2016-06-01 00:00:00'
Replace the column name 'Created' with the column that holds the date field of account creation
Learning the WHERE clause and what you can do there to limit your results will get you on a solid footing to start, from there you will want to learn JOINs and how to link tables by primary keys.
Code academy has some great tutorials https://www.codecademy.com/learn/learn-sql

exporting only rows from sql in phpmyadmin, only where a certain column has Boolean of 0

"meta/background about the use of code and person using it"
1.site built by professional that left company,
2.I am inexperienced but trying/ want to learn,
3.Customer support site for service reps,
................................................
What im trying to do exactly per stackoverflows parameters.
We have a drop down box listing issues that the customer had in a column labeled "issue_type". I can export via csv entire table load onto excel then give to boss for overall review of what the issues were. However data base has a "hide" column. Its function is that when the row is updated the record is kept but the same "job or call" has only one viewable report on site (the most recently updated one). Hide is a boolean. In conclusion I want to export rows that only has the "hide" column Boolean status at 0, AND to only export the columns "customer", and "issue_type". I can seem to only do one or the other. and have researched a minimum of 4 hours to find answer myself and cannot find a syntax to do both at the same time with phpmyadmin.
I dont want an enormous data that is mostly useless but for issue type and customer but i will have to manually delete all the rows with hide = 1?
Thanks anyone 1st attempt question sorry if not correct for stackflow.

SELECT Customer,Issue_type FROM tickets where hide =0;
Elaborating on what is above for anyone that may be looking for a similar answer, SQL supports the "where" clause of which you can when properly syntaxed select many of your columns and their associated strings, booleans, and numbers to = what your looking for. Wildcards I found later for other uses work as well.
Sorry about the self answer but hopefully someone finds this usefull

SQL IN statement "inclusiveness"

I'm not a programmer, but trying to learn. I'm a nurse, and need to pull data for medical referral tracking from a database. I have a piece of GUI software which builds JOIN queries for me to pull things from the database. One of the operators I can use in the drop-down is "IN." The referral documentation is stored in the table as codes made up of one to three letters. For example, the code for a completed dental referral is CDF, and the code for a dental referral is D.
I want to build a report to allow other nurses to pull all their outstanding referrals, so I'll want to pull "D" but not "CDF"
If I use IN as the operator, and set my parameters to 'S','D','BP' {etc} will that also pull the records which have the other, longer codes which contain those same letters? (like CDF, CSR, CBP)
I don't want to test it because I only have access to the production database, and I don't want to hose up actual patient records. Thanks in advance for any help!

Assuming that the column that holds the referral code holds one and only one code per record (which is what it sounds like) the query should function as you want and will not attempt to match substrings.
In any event, there's no danger that a query in the form IN ('S', 'D', 'BP') will match substrings. To perform substring matches in SQL you have to use the LIKE operator.
The situation in which this will not work is if the referral code column holds multiple codes separated by commas. This is an all-too-common mistake in designing databases but if the product you're using is commercial rather than home-grown, I think it's very unlikely to be the case. If it is, searching it is much more difficult.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas