Spitting long column values to managable size for presenting data neatly - sql

Hi I was wondering if there is a way to split long column values in this case I am using SSRS to get the distinct values with the number of product ID against a category into a matrix/pivot table in SSRS. The problem lies with the amount of distinct category makes it a nightmare to make the report look pretty shall we say. Is there a dynamic way to split the columns in say groups of 10 to make the table look nicer and easy to read. I was thinking of using in operator then the list of values but that means managing the data every time a new category gets added. Is there a dynamic way to present the data in the best way possible? There are 135 distinct category values
Also I am open to suggestions to make the report to nicer if anyone has any thoughts. I am new to SSRS and trying to get to grips with its.
Here is an example of my problem
enter image description here

Are your column names coming back from the database under the SubCat field you note in the comments above? If so I imagine your dataset looks something like this
Subcat | Logno
---------+---------------
SubCatA | 34
SubCatB | 65
SubCatC | 120
SubCatD | 8
SubCatE | 19
You can edit this so that there is an index of each individual category being returned also, using the Row_Number() function. Add the field
ROW_NUMBER() OVER (ORDER BY SubCat ASC) AS ColID
To your query. This will result in the following.
Subcat | LogNo | ColID
-----------+--------------+----------
SubCatA | 34 | 1
SubCatB | 65 | 2
SubCatC | 120 | 3
SubCatD | 8 | 4
SubCatE | 19 | 5
Now there is a numeric identifier for each column you can perform some logic on it to arrange itself nicely on the page.
This solution involves a Tablix, nested inside a Matrix nested inside a Matrix as follows
First create a Matrix (Matrix1), and set it’s datasource to your dataset. Set the Row Group Properties to group on the following expression where ‘4’ is the number of columns you wish to display horizontally.
=CInt(Floor((Fields!ColID.Value - 1) / 4))
Then in the data section of the Matrix (bottom right corner) insert a rectangle and on this insert a new Matrix (Matrix 2). Remove the leftmost row. Set the column header to be the Column Name SubCat. This will automatically set the column grouping to be SubCat.
Finally, in the Data Section of Matrix 2 add a new Rectangle and Add a Tablix on it. Remove the Header Row, and set it to be one column wide only. Set the Data to be the information you wish to display, i.e. LogNo.
Finally, delete the Leftmost and Topmost rows/columns from Matrix 1 to make it look tidier (Note Delete Column Row only! Not associated groups!)
Then when the report is run it should look similar to the following. Note in my example SubCat = ColName, and LogNo = NumItems, and I have multiple values per SubCat.
Hopefully you find this helpful. If not, please ask for clarification.

Can you do something like this:
The following gives the steps (in two columns, down then across)

Related

Changes to table not designed for SQL

I am supposed to do some changes to an enormous CSV file based on a different file. Therefore I chose to do it in SQL but after further consideration I am not sure how to proceed..
In the 1st table I have a list of contracts. Columns represent some segments the contract belongs to and some products that can be linked to the contract (example in the table below).
Here contract no. 1234 belongs to segments X1 and Y2. There is no product number 1 linked to it, but it has product number 2 linked to it. The product originaly ends on the 1st of January 2030.
cont_n|date|segment_1|segment_2|..|prod_1|date_prod_1|product_2|date_product_2|..
1234 |3011| X1 | Y2 |..| | |YES |01/01/2030 |..
The 2nd file is a list of combinations of segments and an indication how the "date" columns should be adjusted. The example shows following situation - if there is prod_2 linked to the contract which belongs to groups X1 and Y2, end the prod_2 this year. I need this result to alter table no. 1.
prod_no|segment_1|segment_2|result
prod_2 | X1 | Y2 | end the product on anniversary
Ergo I need to get to the result:
cont_n|date|segment_1|segment_2|..|prod_1|date_prod_1|product_2|date_product_2|..
1234 |3011| X1 | Y2 |..| | |YES |30/11/2019 |..
In the original files I have around 600k rows and more than 300 columns (meaning around 100 different products) in table 1 and around 800 possible combinations of segments in table 2.
The algorithm I need to implement (very generally):
for x=1 to 100
IF product_x = YES THEN date_product_x = date + "Seach for result in table2"
Is there a reasonable way how to change the "date_product_x" columns based on the 2nd table or would it be better to find a different solution?
Thanks a lot!
I can only give you a general approach, because the information in your question is general (for example, why does "end the product on anniversary" translate to "30/11/2019"? It's not explained in the question, so I assume you're going to be able to handle that part of the logic).
You can approach this by using an UNPIVOT on Table 1 to get a structure like:
cont_n | segment1 | segment2 | product_number | product_date
You will UNPIVOT..FOR date_product_1 thru date_product_100. You'll either have to type out all 100 column names, or use dynamic sql to build the whole thing.
You'll do some string manipulation to grab the "x" portion of "date_product_x", and turn it into "prod_x", and then you can join to the second table on the two segment columns and the "prod_x" column, get the result column value, and do whatever rules you're doing to get the value you want for date_product_x.
Finally, you take that result, and PIVOT it back to the one-row-per-contract form, and JOIN it to your original table to UPDATE the date_product_x columns.

SQL Report Builder Table Dynamic Color

This might be beyond the limits of report builder.
I have an SQL Report that generates a table. The table is a fixed number of columns with a dynamic number of rows. The point of the table is to show recipe information of a system and I would like to make it more clear to a user when recipe information has been changed.
Example
+-----------+-------+-----+
| row names | Col A | Col |
+-----------+-------+-----+
| row1 | 10 | 20 |
| row2 | 14 | 20 |
+-----------+-------+-----+
The value of colA has changed so I would like to either change the cell color of row2,colA or change the font. To make it clear to a user what has changed.
I would want to do this dynamically for ever row. Basically compare to the row before and determine if I need to change colors of any of the cells.
Assuming you just mean you want to compare the previous row shown in the report then you can this is easily.
To recreate the exmaple, I used the following query in a dataset called DataSet1.
DECLARE #t TABLE(RowID int, ColA int, ColB int)
INSERT INTO #t VALUES
(1,10,20),
(2,14,20),
(3,14,20),
(4,15,21),
(5,15,22)
SELECT * FROM #t
I then created a simple table with the results sorted by RowID.
I then changed the BackgroundColor property of the cell containing [ColA] to the following expression.
=IIF(
Fields!ColA.Value = Previous(Fields!ColA.Value)
OR Fields!RowID.Value = MIN(Fields!RowID.Value, "DataSet1")
, Nothing
, "Khaki"
)
I repeated this for Col B changing the field name as applicable.
The expression simply checks if the value is the same as the previous row. For the first row with will always be false so I put a check in to see if we are formatting the first row. So if either (a) we are on the first row or (b) the number is the same as the previous number in this column, then the background is set to Nothing (the default), if neither of the conditions are met we set the background to 'Khaki'.
The end result looks like this...
The another way would be modifying the .RDL file.The RDL file basically contains XML tags so if you open them in Notepad or Visual Studio using View Code .You will find a tag called BackgroundColor for each and every column .By giving the color code in the place of existing one will get you the desired color.you can design a custom color and use it in your report.

ssrs report report filter with no duplicates used in query

I am having an issue and I'm not sure how to solve it.
I have an SSRS report that pulls from a table. I want a parameter filter to show de-duplicated values based on available options in one of the columns.
So my dataset with a query like:
SELECT * FROM table1 WITH (NOLOCK) WHERE col1 IN (#param)
Then I want a parameter called param that gets its available and default values from col1 in the above data set and I want them to be de-duplicated.
From reading online I learned I have to create a dummy param and use VBA code to de-duplicate that list.
So I have these params:
param_dummy that gets its available and default values from col1 in the above dataset
param that gets a de-duplicate list from param_dummy using Code.RemoveDuplicates
But I'm having an issue with circular logic. param gets its value from param_default which gets its value from the dataset/query which uses param.
How can I solve this?
One thought is to remove the WHERE col1 IN (#param) and instead use a filter on the Tablix table in the SSRS report. This works but I am wondering how efficient it is.
And/or if anyone has any other suggestions I am all ears.
Updated to add more details...
So let us say I have a table in my DB like so:
| id | col1 | col2 |
|----|------|--------|
| 1 | a | hello |
| 2 | b | how |
| 3 | a | are |
| 4 | c | you |
| 5 | d | on |
| 6 | a | this |
| 7 | b | lovely |
| 8 | c | day |
What I want is:
a Tablix to show all the fields from the table
a filter where the user can select between the available dropdowns in col1 (de-duplicated)
a text filter that allows nulls where a user can filter on col2
the parameters will have default values so the table will load on page load
So I have a dataset with a query like so:
SELECT
*
FROM dbo.table1
WHERE col1 IN (#col1options) AND (#col2value IS NULL OR col2 = #col2value)
Then for col1options I would make available and default options be Get values from a query and I would use the above dataset and col1.
But this won't work since the query/dataset depends on col1options which gets its default values from the query/dataset.
I can use a second dataset but that means making multiple calls to the SQL server and I want to avoid that.
I'm not sure I understand your issue so this is a guess...
If you mean you want to be able to filter your data by choosing one or more entries from a specific column in the table, but this column has duplicates and you want your parameter list to not show duplicates then this is what do to.
Create a new report
Add dataset dsMain as SELECT * FROM myTable WHERE myColumn IN (#myParam)
Add dataset dsParamValues as SELECT DISTINCT myColumn FROM myTable ORDER BY myColumn
Edit the #myParam parameter properties and set the available and default values to a query, then choose dsParamValues
Add you table/matrix control and set it's dataset property to dsMain
Found an easier solution.
Follow this link to build the "dummy" hidden parameter, the visible paramter and the de-dupe VBA code
Add a tablix properties filter where param is in the visible / non-hidden parameter from above VBA (FYI double click to add parameter)
Adding via double click will append a (0) at the end, remove the (0)
It should work as expected at that point! You should be able to select one, some or all parameters and your report should update accordingly.

Is condensing the number of columns in a database beneficial?

Say you want to record three numbers for every Movie record...let's say, :release_year, :box_office, and :budget.
Conventionally, using Rails, you would just add those three attributes to the Movie model and just call #movie.release_year, #movie.box_office, and #movie.budget.
Would it save any database space or provide any other benefits to condense all three numbers into one umbrella column?
So when adding the three numbers, it would go something like:
def update
...
#movie.umbrella = params[:movie_release_year]
+ "," + params[:movie_box_office] + "," + params[:movie_budget]
end
So the final #movie.umbrella value would be along the lines of "2015,617293,748273".
And then in the controller, to access the three values, it would be something like
#umbrella_array = #movie.umbrella.strip.split(',').map(&:strip)
#release_year = #umbrella_array.first
#box_office = #umbrella_array.second
#budget = #umbrella_array.third
This way, it would be the same amount of data (actually a little more, with the extra commas) but stored only in one column. Would this be better in any way than three columns?
There is no benefit in squeezing such attributes in a single column. In fact, following that path will increase the complexity of your code and will limit your capabilities.
Here's some of the possible issues you'll face:
You will not be able to add indexes to increase the performance of lookup of records with a specific attribute value or sort the filtering
You will not be able to query a specific attribute value
You will not be able to sort by a specific column value
The values will be stored and represented as Strings, rather than Integers
... and I can continue. There are no advantages, only disadvantages.
Agree with comments above, as an example try to use pg_column_size() to compare results:
WITH test(data_txt,data_int,data_date) AS ( VALUES
('9999'::TEXT,9999::INTEGER,'2015-01-01'::DATE),
('99999999'::TEXT,99999999::INTEGER,'2015-02-02'::DATE),
('2015-02-02'::TEXT,99999999::INTEGER,'2015-02-02'::DATE)
)
SELECT pg_column_size(data_txt) AS txt_size,
pg_column_size(data_int) AS int_size,
pg_column_size(data_date) AS date_size
FROM test;
Result is :
txt_size | int_size | date_size
----------+----------+-----------
5 | 4 | 4
9 | 4 | 4
11 | 4 | 4
(3 rows)

Pulling items out of a DB with weighted chance

Let's say I had a table full of records that I wanted to pull random records from. However, I want certain rows in that table to appear more often than others (and which ones vary by user). What's the best way to go about this, using SQL?
The only way I can think of is to create a temporary table, fill it with the rows I want to be more common, and then pad it with other randomly selected rows from the table. Is there a better way?
One way I can think of is to create another column in the table which is a rolling sum of your weights, then pull your records by generating a random number between 0 and the total of all your weights, and pull the row with the highest rolling sum value less than the random number.
For example, if you had four rows with the following weights:
+---+--------+------------+
|row| weight | rollingsum |
+---+--------+------------+
| a | 3 | 3 |
| b | 3 | 6 |
| c | 4 | 10 |
| d | 1 | 11 |
+---+--------+------------+
Then, choose a random number n between 0 and 11, inclusive, and return row a if 0<=n<3, b if 3<=n<6, and so on.
Here are some links on generating rolling sums:
http://dev.mysql.com/tech-resources/articles/rolling_sums_in_mysql.html
http://dev.mysql.com/tech-resources/articles/rolling_sums_in_mysql_followup.html
I don't know that it can be done very easily with SQL alone. With T-SQL or similar, you could write a loop to duplicate rows, or you can use the SQL to generate the instructions for doing the row duplication instead.
I don't know your probability model, but you could use an approach like this to achieve the latter. Given these table definitions:
RowSource
---------
RowID
UserRowProbability
------------------
UserId
RowId
FrequencyMultiplier
You could write a query like this (SQL Server specific):
SELECT TOP 100 rs.RowId, urp.FrequencyMultiplier
FROM RowSource rs
LEFT JOIN UserRowProbability urp ON rs.RowId = urp.RowId
ORDER BY ISNULL(urp.FrequencyMultiplier, 1) DESC, NEWID()
This would take care of selecting a random set of rows as well as how many should be repeated. Then, in your application logic, you could do the row duplication and shuffle the results.
Start with 3 tables users, data and user-data. User-data contains which rows should be prefered for each user.
Then create one view based on the data rows that are prefered by the the user.
Create a second view that has the none prefered data.
Create a third view which is a union of the first 2. The union should select more rows from the prefered data.
Then finally select random rows from the third view.