Using sql subquery in Tableau

Using sql subquery in Tableau - sql

I trying to calculate a new field in Tableau similar to a sub query in SQL. However, my numbers not matching up when I try to do this. I am stuck at this point and I am trying to see what others have done.
To provide reference, below is the subquery that I am trying to duplicate in Tableau.
select
((sum(n.non_influenced_sales*n.calls)/sum(n.calls))-(sum(n.average_sales*calls)/sum(calls))))/
(sum(n.non_influenced_sales*n.calls)/sum(n.calls)) as impact
from (select
count(d.id) as calls
,avg(d.sale) as average_sales
,avg(case when non_influenced=1 then d.sale else null end) as non_influenced_sales
from data d
group by skill) n
When I try to build the same calculation in Tableau, I am able to get the same results as long as I comment out the group by skill. However, when I try to group by skill, my attempts to match the number have not been working.
The closest I have come is when I try to fix the level of detail expression by using include. Tableau code:
(include [skill]:([non_influenced_sales]-[average_sales])/[non_influenced_sales]}
However, doing this or using fixed has not worked and I can't match the numbers I am getting from SQL.
FYI, Impact is an aggregated measure. I built the sub-query part in tableau by just creating separate fields for the calculation I needed. So for example
Non Influenced Sales calculated in Tableau:
avg(if [non_influenced]=1 then [non_influenced_sales] end)
However, I am not sure if this matters or not.
I have also tried creating custom sql. I am able to get a rolled up version using all of the dates correct. But when I want to get down to different dates/use other filters, things get messy real quick. I am trying to build relationships on a date level, but that hasn't worked either.
Is there an easier way to do this?

Related

How can I use data from more than one measurement in a single Grafana panel?

I am attempting to create a gauge panel in Grafana (Version 6.6.2 - presume that upgrading is a last resort, but possible if necessary, for the purposes of this problem) that can represent the percentage of total available memory used by the Java Virtual Machine running a process of mine. the problem that I am running into is the following:
I have used Springboot actuator's metrics and imported them into an Influx database with Micrometer, but in the process, it has stored the two values that I would like to use in my calculation into two different measurements. jvm_memory_used and jvm_memory_max
My initial Idea was to simply call a SELECT on both of the measurements to get the value that I want, and then divide the "used" / "max" and multiply that value by 100 to get the percentage to display. Unfortunately I run into syntax errors when I try to do this manually, and I am unsure if I can do this using Grafana's query builder.
I know that the syntax is incorrect, but I am not familiar enough with InfluxQL to know how to properly structure this query. Here is what I had tried:
(SELECT last("value")
FROM "jvm_memory_used"
WHERE ("area" = 'heap')
AND $timeFilter
GROUP BY time($__interval) fill(null)
) /
(SELECT last("value")
FROM "jvm_memory_max"
WHERE ("area" = 'heap')
AND $timeFilter
GROUP BY time($__interval) fill(null)
)
(The AND and GROUP BY are present as a result of the default values from Grafana's query builder, I am not sure whether they are necessary or not)
I'm assuming that my parenthesis and division process is illegal, but I am not sure how to resolve it.
How can I divide these two values from separate tables?
EDIT: I have gotten slightly further but it seems that there is a new issue. I now have the following query that I am sending in:
SELECT 100 * (last("used") / sum("max")) AS "percentUsed"
FROM(
SELECT last("value") AS "used"
FROM "jvm_memory_used"
WHERE ("area" = 'heap')
AND $timeFilter
),(
SELECT last("value") AS "max"
FROM "jvm_memory_max"
WHERE ("area" = 'heap')
AND $timeFilter
)
GROUP BY time($__interval) fill(null)
and the result I get is this:
How can I now get this query to return only one gauge with data, instead of two with nulls?
I've accepted an answer that works for versions of Grafana after 7. If there are any other answers that arise that do not involve updating the version of Grafana, please provide them as well!

I am not particulary experienced with Influx, but since your question is how to use/combine two measurements (query results) for a Grafana panel, I can tell you about one approach:
You can use a transformation. By that, you can keep two separate queries. With the transformation mode binary operation you can simply divide one of your values by the other one.
In your specific case, to display the result as percentage, you can then use Percent (0.0-1.0) as unit and you should have accomplished your goal.

Multiple subtotals - Rollup order of fields

I am trying to run a query that aggregates data, groups the results by several different fields, and extract all relevant "SubTotal" permutations. (similar to CUBE() in MSSQL)
When Using Group By Rollup(), I get only permutations according to the order of the Group By fields in the Rollup function.
For example the query below (runs on a public dataset), it returns subtotal by year, or by year and month, or by year, month and medallion... but it doesn't subtotal by medallion.
SELECT
trip_year,
trip_month,
medallion,
SUM(trip_count) AS Sum_trip_count
FROM
[nyc-tlc:yellow.Trips_ByMonth_ByMedallion]
WHERE
medallion IN ("2R76", "8J82", "3B85", "4L79", "5D59", "6H75", "7P60", "8V48", "1H12", "2C69", "2F38", "5Y86", "5j90", "8A75", "8V41", "9J24", "9J55", "1E13", "1J82")
GROUP BY
ROLLUP(trip_year,
trip_month,
medallion)
My question is:
What should I do in order to get all different permutations of "Sub Totals" in a single query results.
Already tried: Union with similar query but with different order, it works, but not elegant (it would require too many unions).
Thanks

You are correct on both counts. In BigQuery, ROLLUP respects the hierarchy treating the listed fields as a strictly ordered list. Their order will not be changed during aggregation.
The CUBE aggregate commonly found in other SQL environments is unordered and in fact aggregates every possible order/subset of its listed fields. At this time, CUBE has not been implemented in BigQuery. The workaround you suggest is also what I would suggest. UNION all result sets from ROLLUP using each permutation of its contained fields. Albeit not ideal, you should get the same results.
In short, UNIONs of several queries with different permutations of ROLLUP fields is the only way to achieve this at the moment. The downsides are as you state that this may be difficult to maintain and can be more expensive in queries.
If you would like to see CUBE implemented in BigQuery, I strongly encourage you to file a feature request on the Big Query public issue tracker. Be sure to include a thorough use case in this request.
UPDATE: To support the feature request filed by the OP, please star it and you'll receive notifications with updates.

Query to Find Adjacent Date Records

There exists in my database a page_history table; the idea is that whenever a record in the page table is changed, that record's old values are stored in the history table.
My job now is to find occasions in which a record was changed, and retrieve the pre- and post-conditions of that change. Specifically, I want to know when a page changed groups, and what groups were involved in the change. The query I have below can find these instances, but with the use of the min function, I can only get back the values that match between the two records:
select page_id,
original_group,
min(created2) change_date
from (select h.page_id,
h.group_id original_group,
i.group_id new_group,
h.created_dttm created1,
i.created_dttm created2
from page_history h,
page_history i
where h.page_id = i.page_id
and h.created_dttm < i.created_dttm
and h.group_id != i.group_id)
group by page_id, original_group, created1
order by page_id
When I try to get, say, any details of the second record, like new_group, I'm hit with a ORA-00979: not a GROUP BY expression error. I don't want to group by new_group, though, because that's going to destroy the logic (I think it would find records displaying times a page changed from a group to another group, regardless of any changes to other groups in between).
My question, then, is how can I modify this query, or go about writing a new one, that achieves a similar end, but with the added availability of columns that do not match between the two records? In essence, how can I find that min record without sacrificing all the other columns I'm not trying to compare? I don't exactly need a complete answer, any suggestions that point me in the right direction would be appreciated.
I use PL/SQL Developer, and it looks like version 11.2.0.2.0 of Oracle.
EDIT: I have found a solution. It's not pretty, and I'd still like to see some alternatives, but if helping me out would threaten to explode your brain, I would advise relocating to an easier question.

Without seeing your table structure it's hard to re-write the query but when you have a min function used like that it invariably seems better to put it into a separate sub select to get what you want and then compare the result of that.

How could i write this code in a more performant way?

In our app people have 1 or multiple projects. These projects have a start and an end date. People have a limited amount of available days.
Now we have a page that displays the availability of a given person on a week by week basis. It currently shows 18 weeks.
The way we currently calculate the available time for a given week is like this:
def days_available(query_date=Date.today)
days_engaged = projects.current.where("start_date < ? AND finish_date > ?", query_date, query_date).sum(:days_on_project)
available = days_total - hours_engaged
end
This means that to display the page descibed above the app will fire 18(!) queries into the database. We have pages that lists the availability of multiple people in a table. For these pages the amount of queries is quickly becomes staggering.
It is also quite slow.
How could we handle the availability retrieval in a more performant manner?

This is quite a common scenario when working with date ranges in an entity. Easy and fastest way is in SQL:
Join your events to a number generated date table (see generate days from date range) so that you have a row for each day a person or people are occupied. Once you have the data in this form it is simply a matter of grouping by the week date part of the date and counting the rows per grouping.
You can extend this to group by person for multiple person queries.

From a SQL point of view, I'd advise using a stored procedure and pass in your date/range requirement, you can then return a recordset for a user or possibly multiple users. This way your code just has to access db once.
You can then output recordset data in one go, by iterating through.
Hope this helps.

USE Stored procedure to fire your query to SQL to get data.
Pass paramerts in your case it is today's date to the SQl query.
Apply your conditions and Logic in the SQL Stored procedure , Using procedure is the goood and fastest way to retrieve data from the SQL , also it will prevent your code from the SQL injection too.
Call that SP from your Code as i dont know the Ruby on raisl I cant provide you steps about how to Call the Stored procedure from it.
After that the data fdetched as per you stored procedure will be available in Data table or something like that.
After getting the data you can perform all you need
Hope this helps

see what query is executed. further you may make comand explain to your query
explain select * from project where start_date < any_date and end_date> any_date2
you see the plan of query . Use this plan to optimized your query.
for example :
if you have index using field end_date replace a condition(end_date> any_date2 and start_date < any_date) . this step will using index if you have index on this field. But it step is db dependent . example is for nysql. if you want use index in mysql you must have using index condition on left part of where

There's not really enough information in your question to know exactly what you're trying to achieve here, e.g. the code snippet doesn't make use of the returned database query, so you could just remove it to make it faster. Perhaps this is just a bug in the code you posted?
Having said that, there are some techniques you should look into to implement your functionality.
I would take a look at using data warehouse techniques. I would think of your 'availability information' as a Fact table in a star schema, with 'Dates' and 'People' as Dimension tables.
You can then use queries to get stuff like - list of users for this projects for this week, and their availability.
Data warehousing has a whole bunch of resources you can tap into to help make this perform well, but there's also a lot of terminology that can be confusing, but for this type of 'I need to slice and dice my data across several sets of things (people and time)', Data Warehousing techniques can be quite powerful.

As I dont understand ruby on rails,from sql point of view i suggest you to write a stored procedure and return a dataset.And do the necessary table operations on the dataset from front end.It will reduce the unnecessary calls to DB.

Select IIF SUM command

I am using Jet SQL from excel using an ADODB connection to an IBM400 server to try and and get some data. I have done this fine before and it is fine with all other JET SQL commands however I have ran into a problem to which I am unable to solve. It is quite simple so I imagine that I am just not putting the correct syntax in but what I am trying to do is get some totals.
I have a table that contains part numbers and quantities within the locations of that part (more than one location per part). My goal is to have an sql command grab the total quantity (summing all locations) per part. I am able to do this one part at a time successfuly using: (for simplicity I will use part numbers 12345678 and 01234567)
SELECT SUM(CPJDDTA81.F4101JD.LIPQOH) FROM CPJDDTA81.F4101JD WHERE CPJDDTA81.F4101JD.IMLITM = '12345678'
CPJDDTA81.F4101JD is my table, IMLITM is the column name of part numbers, LIPQOH is the quantity on hand per location.
The single search produces the sum I want however the problem comes when trying to run more than one sum within one sql command. I have tried using a select iif command like the following:
SELECT IIF(CPJDDTA81.F4101JD.IMLITM = '12345678',SUM(CPJDDTA81.F4101JD.LIPQOH),IIF(CPJDDTA81.F4101JD.IMLITM = '01234567',SUM(CPJDDTA81.F4101JD.LIPQOH),0) FROM CPJDDTA81.F4101JD
This command provides an error saying that "=" is not a valid token (the = sign within the IIF statement). I was hoping that someone out there can help me write a correct statement to accomplish this. My actual part list will be much larger so I will be using VBA to construct the SQL statement but I need to learn how to do two parts first. Thanks ahead of time.

SELECT CPJDDTA81.F4101JD.IMLITM, SUM(CPJDDTA81.F4101JD.LIPQOH) AS TotalQuantity
FROM CPJDDTA81.F4101JD
GROUP BY CPJDDTA81.F4101JD.IMLITM
Does the above help?
Additional, the items can be limited by adding a WHERE clause.
SELECT CPJDDTA81.F4101JD.IMLITM, SUM(CPJDDTA81.F4101JD.LIPQOH) AS TotalQuantity
FROM CPJDDTA81.F4101JD
WHERE CPJDDTA81.F4101JD.IMLITM IN ('12345678', '01234567')
GROUP BY CPJDDTA81.F4101JD.IMLITM

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas