I am wondering if there is an elegant way to apply either a combination of query, Arrayformula, sort, functions in Google Sheets to do the following - sql

Google Sheets Problem. I have a master list that has columns which are employers, job post, # of spots, parameter x, parameter y,...etc.
"Master Sheet" #a tab
Employers Job Spots
John Cleaner 1
Mike Cleaner 2
John Cleaner 3
John Server 5
Alice Cook 1
Dave Cook 1
Mary Cleaner 3
Alice Server 5
Alice Cleaner 2
Dave Server 4
Mike Server 3
Alice Server 1
This is what I would like "Output Sheet" #another tab with two columns. 1st is Jobs and 2nd is # of employers that account for 80% of the jobs in that category plus any additional filters. The idea is to give a single # that gives an 80/20 rule type metric. The trick is to Sort one column from highest to lowest first. I can do this but in multiple steps that seem annoyingly inefficient. I wonder if there is a better way where I can put everything in one cell and drag down or do a query function. The output looks like below.
Job # of employers that account for ~80% of all the jobs in that category + filters
Cleaner ~3
Cook 1
Server ~3
#because total Cleaner jobs is 11. 80% is 8.8. And sorting employers highest to lowest (after accounting for duplicates), 3 employers represent 80% of the Cleaner jobs available. Server total is 21, 80% is 16.8, so ~3 employers represent 80% of the Server jobs available.
Thank you all for your help.

To take 80%:
=query(A15:C26, "Select B, sum(C)*8/100 group by B label B 'Job'")
you will get
{0.88, 0.16, 1.44)
But the next you can continue by yourself

Related

Using Count and Group By in Power BI

I have a table that contains data about different benefit plans and users enrolled in one or more of those plans. So basically the table contains two columns representing the benefit plan counts and total users enrolled in those plans.
I need to create visualization in Power BI to represent the number of total users enrolled in 1 plan, 2 plans, 3 plans, ...etc.
I wrote the query in sql to get the desired result but not sure how do I do the same in power BI.
Below is my sql query:
SELECT S.PlanCount, COUNT(S.UserName) AS Participants
FROM (
SELECT A.Username, COUNT(*) AS PlanCount
FROM [dbo].[vw_BenefitsCount_Plan_Participants] AS A
GROUP BY A.username
)AS S
GROUP BY S.PlanCount
ORDER BY S.PlanCount
The query result is below image:
So here, PlanCount column represents the total different benefit plans that users are enrolled in. For e.g. the first row means that total of 6008 members are enrolled in only 1 plan, whereas row 2 displays that there are total of 3030 members who are enrolled in total of 2 plans and similarly row 5 means there are only 10 users who are enrolled in total of 6 plans.
I am new to Power BI and trying to understand DAX functions but couldn't find a reasonable example that could help me create my visualization.
I found a something similar here and here but they seem to be more towards single count and group by usage.
Here is a simple example. I have a table of home owners who have homes in multiple cities.
Now in this table, Alex, Dave and Julie have home in 1 city (basically we can say that these 3 people own just 1 home each). Similarly Jim owns a total of 2 homes and Bob and Pam each have 3 homes in total.
Now the output that I need is a table with total number of home owners that own 1 home, 2 homes and so on. So the resulting table in SQL is this.
Where NameCount is basically count of total home owners and Homes is the count of total homes these home owners have.
Please let me know if this helps.
Thanks.
If I understood fine, you have a table like this:
BenefitPlan | User
1 | Max
1 | Joe
2 | Max
3 | Anna
If it's ok, you can simply use a plot bar (for example) where the Axis is BenefitPlan and Value is User. When you drag some column in Value field, it will be grouped automaticaly (like group by in SQL), and by default the groupping method is count.
Hope it helps.
Regards.
You can use DAX to create a summary table from your data table:
https://community.powerbi.com/t5/Desktop/Creating-a-summary-table-out-of-existing-table-assistance/td-p/431485
Once you have counted plans by customer you will then have a field that will enable you to visualize the # of customers with each count.
Mock-up of the code:
PlanSummary = SUMMARIZE('vw_BenefitsCount_Plan_Participants',[Username],COUNT([PLAN_ID])

Updating a database column based on its similarity to another database column

I have a database table (Customers) with the following columns:
ID
FIRST_NAME
MIDDLE_INIT
LAST_NAME
FULL_NAME
I also have a database table (ENG) with the following columns:
ID
ENG_NAME
I want to replace all of the ENG.ENG_NAME entries with a FULL_NAME entry from the CUSTOMERS table
Here is the problem.
The ENG_NAME was hand-jammed through a web form and, so, has no consistency. For instance, one row might contain "Robin Hood". Another "Hood, Robin L". An another "Robin L Hood".
I want to search the entries in the CUSTOMERS table, find a close match, then replace the ENG.ENG_NAME with the CUSTOMERS.FULL_NAME.
Example:
ENG table CUSTOMERS table
ID ENG_NAME ID FULL_NAME FIRST_NAME MIDDLE_INIT LAST_NAME
================ ==================================================================
1 Hood,Robin 1 Robin L Hood Robin L Hood
2 Rob Hood 2 Maid M Marion Maid M Marion
3 Marion M 3 Friar F Tuck Friar F Tuck
4 Rob Garza 4 Robert A Garza Robert A Garza
Based on the data above, I would want ENG_NAME columns to be replaced like this:
ENG table
ID ENG_NAME
====================
1 Robin L Hood
2 Robin L Hood
3 Maid M Marion
4 Robert A Garza
Any thoughts on how to do this?
Thanks
This is not going to be a simple task, I would start at finding a good C# (or any .NET) algorithm that detects similar strings portions.
Then look at Compiling C# Code into SQL Stored Procedures and Invoke that code using SQL Server. This CLR Code can then write the results to a table for you to analyze and do whatever you want with it.
For More: CLR SQL Server User-Defined Function
I would do it in .NET using Levenshtein distance.
Start at 1 and you are going to have some ties and you need to decide
Then move to 2,3,4...
You could do in a CLR but how are you going to deal with ties? And you are going to have ties. How are you going to decide when it is not a match at all?
And I would put it in new column so you have a history of original data
Or a FK reference to customers table

Eliminate duplicate records/rows?

I'm trying to list result from a multi-table query with on row, 2 columns. I have the correct data that I need, I merely need to trim it down to 1 line of results. In other words, eliminate duplicate entries in the result. I'm using a value not shown here, school_id. Should I go with that as a distinct value? Can I do that without displaying the school_id?
SQL> select DISTINCT(school_name),Team_Name
2 from school, team
3 where team.team_name like '%B%'
4 AND school.school_id = team.school_id;
SCHOOL_NAME TEAM_NAME
-------------------------------------------------- ----------
Lawrence Central High School Bears
Lawrence Central High School BEars
Lawrence Central High School BEARS
The problem, as I'm sure you know, is the fact that "Bears" is in 3 different cases here. The simple fix is to do the upper or lower of "Team_Name" so it will only have 1 return record.
UPPER(Team_Name)

How to tally and store votes for a web site?

I am using SQL Server 2005.
I have a site that people can vote on awesome motorcycles. Each time a user votes, there is one for the first bike and one vote against the second bike. Two votes are stored in the database. The vote table looks like this:
VoteID VoteDate BikeID Vote
1 2012-01-12 123 1
2 2012-01-12 125 0
3 2012-01-12 126 0
4 2012-01-12 129 1
I want to tally the votes for each bike quite frequently, say each hour. My idea is to store the tally as a percentage of contest won versus lost on the bike table as an attribute of the bike. So, if a bike won 10 contests and lost 20 contest, they would have a score (tally) of 33. I would tally up daily, weekly, and monthly scores.
BikeID BikeName DailyTally WeeklyTally MonthlyTally
1 Big Dog 5 10 50
2 Big Cat 3 15 40
3 Small Dog 9 8 0
4 Fish Face 19 21 0
Right now, there are about 500 votes per day being cast. We anticipate 2500 - 5000 per day in the next month or so.
What is the best way to tally the data and what is the best way to store it? Should the tallies be on their own table? Should a trigger be used to run a new tally each time a bike is voted on? Should a stored procedure be run hourly to get all tallies?
Any ideas would be very helpful!
Store your VoteDate as a datetime value instead of just date.
For your tallies, you can just make that a view and calculate it on the fly. This should be very simple to do using GROUP BY and DATEPART functions. If you need exact code for how to do this, please open a new question.
For that low volume of rows it doesn't make any sense to store aggregations in a table when you can just calculate them whenever you want to see them and get accurate and immediate results that are up-to-date.
I agree with #JNK try a view or just a normal stored proc to calculate the outputs on the fly. If you find it becomes too slow as your data grows I would investigate other routes then (like caching the data in another table etc). Probably worth keeping it simple to start with; you can always resuse the logic from the SP/VIEW later if you do want to setup a scheduled task.
Edit :
Removed the index view as per #Damien_The_Unbeliever comments its not deterministic and i'm stupid :)

Sort Excel Grouped Rows

I have a spreadsheet that has information in groups. The header row contain company names and information and then the grouped rows beneath them contain names of people in the company.
Company Name | Number of Employees | Revenue |
Employee Name | Email | Phone
Is there anyway to sort by the number of employees and/or revenue and keep the grouped employee information below the company with the information?
Normally when I try it, it will sort the company information but keep the employee information in the order that it is entered.
If I understand your question correctly, I have a way you can accomplish what you want (don't know if there is a more efficient method).
Write code which will, for each company header row, copy the number of employess and revenue data into two of the chosen unused columns. The data needs to be copied into the columns for both the header company row and detail employee rows.
In the third column assign a sequence number. This is to keep data together and in order when sorting by employee/revenue.
Now you can sort by either the newly created number of employees and/or revenue columns (along with the sequence column to maintain ordering within company).
After the sort you can delete the extra copied data rows.
So if your data looked like this to start with...
A B C
Penetrode 200 750000
Micheal Bolton mbolton#pene.com 555-555-3333
Samir N samirn#pene.com
Initech 500 500000
Bill Lumbergh umumyeah#init.com 555-555-1212
Peter Gibbons pgibbons#init.com 555-555-2222
Your code would then copy the employee count and revenue data and sequencify the rows using three unused columns.
A B C D E F
Penetrode 200 750000 200 750000 1
Micheal Bolton mbolton#pene.com 555-555-3333 200 750000 2
Samir N samirn#pene.com 555-555-3334 200 750000 3
Initech 500 500000 500 500000 4
Bill Lumbergh umumyeah#init.com 555-555-1212 500 500000 5
Peter Gibbons pgibbons#init.com 555-555-2222 500 500000 6
Then you can code a sort on any of the column combos: (D,F), (E,F), (D,E,F), or (E,D,F)
Better late than never, I suppose, but I feel my LAselect plugin would have solved your problem. I created this plugin because I do much non-standard 'stuff' with my data and needed a tool to handle it. LAselect can produce your 'group' output too and you would not need hidden columns or anything. I mean, you would not need to change the screens you are used to to sort them in whatever way you wanted.