Summing different parts of a column in SQL - sql

I have a database extract in excel and want to create a custom value in Tablue using their create calculation, which I believe is SQL based.
Basically I have a large number of feeds which all show up different amounts in a column. For example:
feed 1
feed 1
feed 2
feed 3
feed 4
feed 4
feed 4
And I want to have a sum for feed 1, feed 2, and feed 4. But in my actual DB there's about 100 feeds all with different number of appearances. I'm having troubles finding a good way to do this. If there even is one. Any help or direction would be appreciated!

I'm assuming that your list is a single column and you need a count of the number of occurrences of each feed. For the sake of example, since a column or table names were not supplied, let's call them colname and tablename.
select colname, count(*) as Ct from tablename group by colname

It would be easier to give an exact answer if you posted a small simplified subset of your spreadsheet. But assuming you have a column called "feed_name" which takes on values like "feed 1", "feed 2" etc depending on the row. Then the feed_name column should be a discrete dimension in Tableau.
Then just put the feed_name pill on a shelf, say the row shelf. And put the "Number of Records" field on another shelf, say the column shelf.
You don't need to write SQL to do this (or most tasks) in Tableau. It helps to understand SQL concepts and its very helpful to drop down to the SQL level when needed to solve tricky issues. But for most situations, you can just interactively explore the data by moving fields around and writing some simple calculations -- and let Tableau take care of generating the SQL necessary to retrieve the data needed to build the visualization you requested.
Tableau supports SQL and some NO-SQL data sources, along with some cubes too. It does that quite well and in multiple ways. You just can work more quickly and efficiently by using Tableau's visual based manipulations in most cases, and then drop to the lower level detail when needed. It just takes getting used to how Tableau operates.

Related

Column with multiple data types in Power BI Matrix Table

My issue is similar to this one Multiple data types in a Power BI matrix but I've got a bit of a different setup that's throwing everything off.
What I'm trying to do is create a matrix table with several metrics that are categorized as Current (raw data values) and Prior (year over year percent growth/decline). I've created some dummy data in Excel to get the format the way I want it in PowerBI (see below):
Desired Format
As you can see the Current values are coming in as integers and the Prior % numbers as percentages which is exactly what I want; however, I was able to accomplish this through a custom column with the following formula:
Revenue2 = IF(Scorecard2[Current_Prior] = "Current", FORMAT(FIXED(Scorecard2[Revenue],0), "$#,###"), FORMAT(Scorecard2[Revenue], "Percent"))
The problem is that the data comes from a SQL query and you can't use the FORMAT() function in DirectQuery. Is there a way I can have two different datatypes in the same column of data? See below for how the SQL data comes into PowerBI (I can change this if need be):
SQL
Create 2 separate measures, one for the Current second for Prior, and format these measures.
Probably you can also use a case in SQL query to format your data to bring it as STRING.
What I wound up doing was reformatting the SQL code to look like this:
Solution
That way Current/Prior are have two separate values and the "metric" is categorical.
I got the idea from this post:
Simple way to transpose columns and rows in SQL?

Separating columns ( array of arrays) - Advanced SQL looping

I tried using a name that more accurately describes my question but msg said I am limited to 150 chars.
Looking for assistance from someone who has advanced SQL skills. Ideally I want to do it in SQL to let the computer do the work. Too much manual manipulation is ripe with the possibility of mistakes.
I've already searched for users groups within Google. All emails are being returned saying the email does not exist anymore.
What I am using appears to be a proprietary version of Dremel SQL / Google SQL, however, someone experienced in Dremel SQL will probably be able to guide me in the right direction.
BACKGROUND INFO:
Pulling a column that is an array column which holds another array (a notes column). I think maybe an array of arrays?
I have not figured a way to do what I am trying to do with Google or Dremel SQL yet.
So for now, I am doing it the hard way.
As originally pulled, the data looks like this [{Array of arrays}, {Array of arrays}, {Array of arrays}, etc., repeat... :
More specifically: [{4 or more text fields which could also hold numbers and separated by commas}, {another set of fields}, {another set of fields}...]
I.E. (this is all in just one column of data and hundreds of rows)
[
{"created":"1540236216969","notes": blah... blah... blah", "original_text_length":534, "User_email":"someone#emailaddress.com","user_shortname":"someone"},
{"created":"1540236216969","notes": blah... blah... blah", "original_text_length":1224, "User_email":"someone#emailaddress.com","user_shortname":"someone"},
{"created":"1540236216969","notes": blah... blah... blah", "original_text_length":1664, "User_email":"someone#emailaddress.com","user_shortname":"someone"}
...
]
The number of these is different for each row pulled and each has a specific ID #
A typical row of data is:
ID #, start_date, end_date, some other fields, notes_(the array field)
WHAT I AM DOING NOW is:
SQL data pull,
exporting to google sheets,
make separate tabs for the different array columns.
copying the notes column (the array column holding arrays) to a separate tab on Google Sheets, then
Split Text To Columns using the first curly brace "{" as the separater.
Here is where my dillema is.
Once pulled, I need to split all of those columns again to separate each of the individual elements in each array. Unable to Split text to Columns again with all of them highlighted. I can Split Text to Columns again one at a time but will really be a pain if I have to do that individually for each column and every row (hundreds of rows). Need to find a way to automate this.
I will also need to change each of unix dates to calendar dates within each array PLUS add rows to the spreadsheet depending on the number of columns from the first split. The columns are different for each row depending on how many notes have been added.
OR... do it with SQL (which appears to be a proprietary type of SQL similar to NoSQL but not the same). I have tried using syntax's for IBM SQL, Oracle SQL, SQL Server, and others found online but none work.
OR... do it with a looping function within Google Sheets.
Possibly re-add it to the database as a new table once both sets of arrays are completely split up.
END RESULT
ID#, date1, date 2, first created date (right now a unix date), first note, first other field, etc...
Then add a new row with
Same ID# from above, date1 from row above, date 2 from row above, next (2nd) created date (right now a unix date), 2nd note, 2nd other field, etc...
Add a new row...
3rd set of notes etc.

Dynamically creating a pivot table using fuzzy matching

So, I'm constantly being given data in new and different formats. I'm on a crusade to get my work to standardize data for easy use, and if I managed to convince the powers that be to standardize data, this problem becomes entirely moot. Until then, I have the following problem:
I get data in a variety of ways. Sometimes my gross sales are called total sales. Sometimes gross sales before discounts, total sales before discounts, Gross_Sales, etc. Discounts, deductions, exempt amounts, etc. form another column. So on and so forth. I'd like to be able to do the following:
1) Figure out what columns I want,
2) Turn those columns into a pivot table.
For part 1, I have two options, and I'm wondering if there's anymore: The 1st is to use Microsoft's fuzzy-matching add-in to help me match. I'd have a separate tab dedicated to fuzzy matching each column I need. The second is to just generate a long list of all the variants, and to test each one until I find a hit, assign it, and move onto testing the next one.
The second part is turning all of this into a pivot table - the resouces I have so far are https://www.thespreadsheetguru.com/blog/2014/9/27/vba-guide-excel-pivot-tables and How to Create a Pivot Table in VBA
Is there a better method? Is there another way?
Edit: Slightly better method - Grab the data columns, place them into a table, and pivot everything off of that table - it removes the need to re-create pivot tables, just need to move the data over.
Having the same problem, I use a mix of your two methods.
My data consists of a bunch of logs for rejected x-ray images, and the reject reason is a free text field. My solution was to create a table where the first column contains my desired output categories, and then each subsequent column contains a different variation of it.
For example, a row might have (column one/ouput first entry):
Positioning, POS, Positioning Error, Patient Positioning
Note that these are all fairly different from each other. Where the fuzzy matching comes in - it is used to capture all the smaller differences and mispellings around those other columns. When the fuzzy matching section decides a given reason matches a column's entry, it is then replaced with the appropriate desired output reason from column 1 of the table. In my example, a reason of 'Possitioning Err' [sic] would match to column 3 (Positioning Error) and then get converted to Positioning.
Then wash rinse repeat over the rest of your data as needed. This approach was super useful and fairly flexible in helping standardize my data. It was also computationally more expensive, but you'd only need to run the matching portion once I guess.
As for the actual mechanics of going about doing this - I use 2010, so no inbuilt functionality. I run the fuzzy matching code on a temporary worksheet until best percentage matches are found, and then overwrite the actual source data afterwards.

Is there a way to compare multiple columns in PowerPivot using DAX?

Say I have a table in a PowerPivot that looks like this:
For each row, I want to find the minimum value across Columns 1, 2 and 3 and display that value in the column "MinColumn". The built-in MIN function only seems to operate on a column though, and not a row.
Is there any way to do this, other than some kind of nested IF expression? If we have a lot of columns to compare, that would get very messy, very quickly.
PowerPivot/DAX does some great column-based stuff (to be expected given it's use of xVelocity) but seems to get complex when you start looking at row-level functionality.
Another option is to push the calculation down to the source.
For example, if your source is a database table, you could create a view (or simply use a named query) and calculate the MIN (across the 3 other columns) before you pull the data into PowerPivot.
Note: the TSQL version would also be fairly ugly, PIVOT + MIN() OVER()

Splitting data by column value into an indefinite number of tables using an ETL tool

I'm trying to split a table into multiple tables based on the value of a given column using Talend Open Studio. Let's say this column can contain any of the integer values of 1, 2, 3, etc. then according to this value, these rows should go to table_1, table_2, table_3 etc.
It would be best if I could solve this when the number of different values in that column is not known in advance, but for now we can assume that all these output tables exists already. The bottom line is that the number of different values and therefore the number of different tables are high enough that setting up the individual filters manually is not an option.
Is this possible to solve this using Talend Open Studio or any similiary open source ETL tools like Pentaho Keetle?
Of course, I could just write a simple script myself, but I would prefer to use a proper ETL tool since the complete ETL process is quite complex.
In PDI or Pentaho Kettle you could do this with partitioning. (A right click option on the step IIRC) Partitioning in PDI is designed for exactly this sort of problem.
Yes that's Possible to do and split the data on the basis of single column to different table, but for that you need to create table dynamically :-
tFileInputDelimited->tFlowtoIterate ->tFixedFlowInput->and the can use
globalMap() to get the column values and use the same to seperate the
data to different tables. -> And the can use globalMap(Columnused to
seperate data) in table name.
The first solution that came to my mind was using the replicator to transport the current row to three filters which act as guard and only let rows through with either 1 2 or 3 in the given column. pic: http://i.imgur.com/FmvwU.png
But you could also build the table name dynamically, if that is what you want, pic: http://i.imgur.com/8LR7Q.png