Expand-collapse report with data set based on GROUPING SETS - sql

I've used the Expand/Collapse feature in SSRS reports before, but in all those cases it was Reporting Services that was doing the grouping and totalling. This time around I utilize GROUPING SETS in my dataset query to let SQL Server handle aggregating the data. I want to create a report that has Expand/Collapse features for the groups, but can't seem to get it to work.
Repro
First up, here's a way to get a small repro simulating my actual situation. Use the following query for a dataset:
-- Simulating with already denormalized data for sake of simplicity
DECLARE #Order TABLE (Category VARCHAR(20), Product VARCHAR(20), PersonId INT);
INSERT INTO #Order
(Category, Product, PersonId)
VALUES ('Fruit', 'Banana', 1)
,('Fruit', 'Banana', 1)
,('Cakes', 'Chocolate', 1)
,('Fruit', 'Apple', 2)
,('Cakes', 'Chocolate', 2)
,('Cakes', 'Berry Jam', 3)
,('Cakes', 'Chocolate', 3)
,('Cakes', 'Chocolate', 3)
,('Fruit', 'Banana', 4)
,('Cakes', 'Berry Jam', 5)
SELECT Category,
Product,
COUNT(DISTINCT PersonId) AS NrOfBuyers
FROM #Order AS o
GROUP BY GROUPING SETS ((), (Category), (Category, Product))
This will provide this output (I've manually ordered the output to illustrate my intentions):
Category Product NrOfBuyers
-------- ------- ----------
Fruit Apple 1
Fruit Banana 2
Fruit NULL 3
Cakes Berry Jam 2
Cakes Chocolate 3
Cakes NULL 4
NULL NULL 5
To foreshadow what I'm aiming for, here's what I want to get in Excel.
Expanded version of intended result:
Collapsed version of intended result:
What I've tried so far:
While writing this question and creating the repro I did realize that my first approach of just dumping my dataset in a tablix was wrong.
So what I tried to fix this was recreating the tablix with proper Row Groups like so:
In addition to that I need a column on the left hand side outside the main group to hold the toggle "+" for the grand total row.
However, this gives incorrect numbers for the collapsed version:
These should be different: Cakes and Fruit have a "Subtotal" of 3 and 4, respectively.
This seems like a problem with ordering the rows, so I've checked the sorting for the Tablix and that should order rows as the appear in the "intended result" screenshots. It doesn't, and after a bit I understood why: the groups do sorting as well. So I've added sorting for the groups as well, e.g. this is the one for the Product Row Group:
This seems to improve things (it does the sorting bit I needed anyways) but it doesn't fix having the wrong numbers in collapsed state.
What do I need to do to finish this last stretch and complete the report?

The approach can work, but one last step is needed to get the correct numbers for collapsed state. Know that with the example from the question this design:
Shows the following expression for this cell:
=Fields!NrOfBuyers.Value
But this sneakily seems to come down to this:
=First(Fields!NrOfBuyers.Value)
When it is evaluated in the context of a collapsed row.
So, one way to "fix" this and get the correct sub totals is to change that expression to:
=Last(Fields!NrOfBuyers.Value)
Which will give the desired output in collapsed state:
Or semi-collapsed:
And finally, expanded:

Related

Understanding why UNION is used in this SQL injection example

I'm trying to understand more about SQL injection, so I found this lesson from Red Tiger Labs.
According to the solution, the cat=1 part of the URL is vulnerable to SQL injection.
I can understand that you can append ORDER BY X# and keep incrementing X to establish the number of columns, which is 4.
However according to the solution, the next step is to do:
cat=1 union select 1,2,3,4 from level1_users #
The table name is provided, so that's ok. But I'm really having trouble understanding the purpose of the UNION. My guess is the underlying code does something like:
SELECT * FROM level1_users where cat=1
Presumably it would expect only 0 or 1 results. Then it prints out some number of columns onto the screen. According to the example, it prints out:
This hackit is cool :)
My cats are sweet.
Miau
3
4
The first three lines were printed out without the extra SQL injection. So what's going on, and what's the significance?
I would not expect the union to do anything, I assume the numbers refer to columns?
So, I've managed to figure out what's going on here.
cat=1 union select 1,2,3,4 from level1_users #
The select part selects the numbers 1, 2, 3, 4 as columns. You could actually use anything here, like select 'cats', 'fish', 'bread', 42 and sometimes you have to do this as the union select must match the column types in the target table. The level1_users table is integers (or at least, integers work), hence selecting numbers.
I actually thought it might be selecting columns by their index, because often in sql you can do ORDER BY 1 for example to order by the first column, however that's not the case.
What tripped me up was that this particular SQL injection website dumps the entire contents of the result set to the screen, and I wasn't expecting that. If you think about it though it is looking for a category id and therefore it's not unreasonable to expect it to list everything in that category.
By performing a union it first shows that extra rows will be printed to the screen, and because we've numbered the columns, it shows which columns, columns 3 and 4.
From there it's possible to simply select username and password into those columns (you have to guess the table headers in this instance because although you can normally union onto the db data it has been disabled for this exercise).

Highlighting differences in child values for parent values in OBIEE or SQL

I can't seem to figure out how to select instances where values in the green circle would be highlighted/selected for every parent/orange-circle value. What would this sort of operation be called?
Trying to translate that into an understandable requirement: You want to select (means filter for) "Load IDs" which have more than 1 "Purchase Order Numbers"?
That's what it reads to me from your data grid because the PO Number is the one that changes the query grain and causes two rows. Everything else is the same.
If that's the case then create a measure which counts the PO Numbers and filter on that.

Calculating and displaying customer lifetime value histogram with BigQuery and Data Studio

Consider a table in Google BigQuery containing purchase records for customer. For the sake of simplicity, let's focus on the following properties:
customer_id, product_id, amount
I'd like to create a Google Data Studio report from the above data set showing a customer lifetime value histogram. The customer lifetime value is the sum of amount for any given customer. The histogram would show how many customers fall into a certain bucket by their total amount - I would define the buckets like 0-10, 10-20, 20-30 etc. value ranges.
Like this:
Finally, I'd also like to filter the histogram by product_id. When the filter is active, the histogram would show the totals for customers who - at least once - purchased the given product.
As of this moment, I think this is not possible to implement in Datastudio, but I hope I am wrong.
Things I've tried so far:
Displaying an average customer lifetime value for the whole dataset is easy, via a calculated field in Datastudio as SUM(amount) / COUNT(customer_id)
For creating a histogram, I don't see any way purely in Data Studio (based on the above data set). I think I need to create a view of the original table, consisting a single row for each customer with the total amount. The bucket assignment could be implemented either in Big Query or in Data Studio with CASE ... WHEN.
However, for the final step, i.e. creating a product filter that filters the histogram for those customers who purchased the given product, I have no clue how to approach this.
Any thoughts?
I was able to do a similar reproduction to what you describe but it's not straightforward so I'll try to detail everything. The main idea is to have two data sources from the same table: one contains customer_id and product_id so that we can filter it while the other one contains customer_id and the already calculated amount_bucket field. This way we can join it (blend data) on customer_id and filter according to product_id which won't change the amount_bucket calculations.
I used the following script to create some data in BigQuery:
CREATE OR REPLACE TABLE data_studio.histogram
(
customer_id STRING,
product_id STRING,
amount INT64
);
INSERT INTO data_studio.histogram (customer_id, product_id, amount)
VALUES ('John', 'Game', 60),
('John', 'TV', 800),
('John', 'Console', 300),
('Paul', 'Sofa', 1200),
('George', 'TV', 750),
('Ringo', 'Movie', 20),
('Ringo', 'Console', 250)
;
Then I connect directly to the BigQuery table and get the following fields. Data source is called histogram:
We add our second data source (BigQuery) using a custom query:
SELECT
customer_id,
CASE
WHEN SUM(amount) < 500 THEN '0-500'
WHEN SUM(amount) < 1000 THEN '500-1000'
WHEN SUM(amount) < 1500 THEN '1000-1500'
ELSE '1500+'
END
AS amount_bucket
FROM
data_studio.histogram
GROUP BY
customer_id
With only the latter we could already do a basic histogram with the following configuration:
Dimension is amount_bucket, metric is Record count. I made a bucket_order custom field to sort it as lexicographically '1000-1500' comes before '500-1000':
CASE
WHEN amount_bucket = '0-500' THEN 0
WHEN amount_bucket = '500-1000' THEN 1
WHEN amount_bucket = '1000-1500' THEN 2
ELSE 3
END
Now we add the product_id filter on top and a new chart with the following configuration:
Note that metric is CTD (Count Distinct) of customer_id and the Blended data data source is implemented as:
An example where I filter by TV so only George and John appear but the other products are still counted for the total amount calculation:
I hope it works for you.

SQL Multiple IN statements on one column

Okay, I'm using WordPress, but this pertains to the SQL side.
I have a query in which I need to filter out posts using three different categories, but they're all terms in the post.
For example:
In my three categories, I select the following: (Academia,Webdevelopment) (Fulltime,Parttime) (Earlycareer).
Now what I want to do is make sure when I query that the post has AT LEAST ONE of each of those terms.
CORRECT RESULT: A post with tags Academia, Fulltime, Earlycareer
INCORRECT RESULT: A post with tags Academia, Earlycareer (doesn't have fulltime or parttime)
Currently, my query looks something like this:
SELECT * FROM $wpdb->posts WHERE
(
$wpdb->terms.slug IN (list of selected from category 1) AND
$wpdb->terms.slug IN (list of selected from category 2) AND
$wpdb->terms.slug IN (list of selected from category 3)
)
AND $wpdb->term_taxonomy.taxonomy = 'jobtype' AND .......
When using this query, it returns no results when I select across the different categories (that is, I can choose 4 things from category 1 and it has results, but I can't choose anything from category 2 or 3. And vice versa)
I'm not sure if this is something to do with using IN more than once on the same column.
Thanks in advance for any help!
Your query seems to be correct. There is no any limitations in SQL about using IN for the same column miltimple times.
But ensure that you don't have any NULL values in your list of selected from category 1/2/3 queries. Even single NULL value in these lists will give NULL as a result of whole 'WHERE' condition and you will get nothing as a result.
If this won't help then it must be WordPress issue.

looping through a numeric range for secondary record ID

So, I figure I could probably come up with some wacky solution, but i figure i might as well ask up front.
each user can have many orders.
each desk can have many orders.
each order has maximum 3 items in it.
trying to set things up so a user can create an order and the order auto generates a reference number and each item has a reference letter. reference number is 0-99 and loops back around to 0 once it hits 99, so orders throughout the day are easy to reference for the desks.
So, user places an order for desk #2 of 3 items:
78A: red stapler
78B: pencils
78C: a kangaroo foot
not sure if this would be done in the program logic or done at the SQL level somehow.
was thinking something like neworder = order.last + 1 and somehow tying that into a range on order create. pretty fuzzy on specifics.
Without knowing the answer to my comment above, I will assume you want to have the full audit stored, rather than wiping historic records; as such the 78A 78B 78C type orders are just a display format.
If you have a single Order table (containing your OrderId, UserId, DeskId, times and any other top-level stuff) and an OrderItem table (containing your OrderItemId, OrderId, LineItemId -- showing 1,2 or 3 for your first and optional second and third line items in the order, and ProductId) and a Product table (ProductId, Name, Description)
then this is quite simple (thankfully) using the modulo operator, which gives the remainder of a division, allowing you in this case to count in groups of 3 and 100 (or any other number you wish).
Just do something like the following:
(you will want to join the items into a single column, I have just kept them distinct so that you can see how they work)
Obviously join/query/filter on user, desk and product tables as appropriate
select
o.OrderId,
o.UserId,
o.DeskId
o.OrderId%100 + 1 as OrderNumber,
case when LineItem%3 = 1 then 'A'
when LineItem%3 = 2 then 'B'
when LineItem%3 = 0 then 'C'
end as ItemLetter,
oi.ProductId
from tb_Order o inner join tb_OrderItem oi on o.OrderId=oi.OrderId
Alternatively, you can add the itemLetter (A,B,C) and/or the OrderNumber (1-100) as computed (and persisted) columns on the tables themselves, so that they are calculated once when inserted, rather than recalculating/formatting when they are selected.
This sort-of breaks some best practice that you store the raw data in the DB and you format on retrieval; but if you are not going to update the data and you are going to select the data for more than you are going to write the data; then I would break this rule and calculate your formatting at insert time