I am not a programmer, but doing Excel work for a small library. We have these fields in an excel sheet:
John | J | Smith | BMI | 123 | 100 |
Sarah | P | Crown | ASCAP | 564 | 100 |
Tommy | T | Stew | BMI | 134 | 100 |
Suzy | S | Smith | BMI | 678 | 50 |
John | J | Smith | BMI | 123 | 50
What I would like to be able to combine any of the cells (in the same row)into one cell that would read like this:
John J Smith, (BMI), 100%, IPI 123
or
Suzy S Smith, (BMI), 50%, IPI 678 | John J Smith (BMI), 50% IPI 123
I figured out how to use the Concatenate function to do this, but it doesn't skip empty cells, and I get extra "|" or "()" in those spots. I also found the =StringConCat topic, and that works great for skipping, but I can't figure out how to add the extra characters.
Any help would be most appreciated.
Thank you!!
EDIT: Thanks for the quick responses so far. I should be more clear -
the pipes in my example were only to designate different cells - they are not actual characters in the cells (thanks for converting it to a table for me, Bruce). The only Pipe character I would like to use is in the results, as in my example between Suzy and John.
There will rarely be more than 2 entries on the same result line, but it is possible. Mostly it will be to composers that are sharing the credit. But there is a chance that they will work on a Public Domain song and I have to list "Traditional" or maybe "Mozart" as another composer.
Sorry that I don't know enough to ask my question as intelligently as I should. Just learning how to do this, and trying to figure it out as I go.
Thanks again!
For the extra spaces, use substitute to get rid of empties.
So, if your code is =concatenate(A1,B1,C1) and your 'empty spaces' are "| " then edit your formula to become =substitute(concatenate(A1,B1,C1),"| ","")
You can even stack the substitutes to add more possible 'empties', like " " (two spaces) or the like. =substitute(substitute(concatenate(A1,B1,C1),"| ","")," ","")
Related
I'm wracking my brain trying to figure this out. I have a dataset / table that looks like this:
ID | Person1 | Person2 | Person3 | EffortPerPerson
01 | Bob | Ann | Frank | 2
02 | Frank | Bob | Joe | 3
03 | Ann | Joe | Beth | 1
I'm trying add up "Effort" for each person. For example, Bob is 2+3, Joe is 3+1, etc. My goal is to produce a PowerBI scatter plot showing total Effort for each person.
In a perfect world, the query shouldn't care how many "Person" fields there are. It should just count up the Effort value for every row that the individual's name appears.
I thought GROUP BY would work, but obviously that's only for one column, and I can't wrap my head around how to make nested queries work here.
Any one have any ideas? Thanks in advance!
As Nick suggested, you should go with the Unpivot transformation. Go to Edit Queries and select Transform tab:
Select columns you want to transform in rows, open dropdown menu under Unpivot Columns and select "Unpivot Only Selected Columns":
And that's it! Power BI will aggregate values for you:
I'm trying to aggregate some instructor data (to easily show which courses an instructor taught in a semester), and up until now I've just accepted having multiple rows for each instructor. However, it would be beneficial to some business processes if I could have all of an instructor's teaching in a single row. Here is some example data (my tables have a lot more columns, but the general idea won't change much.
tbl_Instructors has:
N_ID | F_Name | L_Name
001 Joe Smith
002 Henry Fonda
003 Lou Reed
tbl_Courses has:
Course_ID | N_ID | Course_Info
AAA 001 PHYS 1
AAB 001 PHYS 2
CCC 002 PHYS 12
DDD 003 PHYS 121
FFF 003 PHYS 224
What I want to return is:
N_ID | First_Name | Last_Name | Course_IDs
001 Joe Smith AAA, AAB
002 Henry Fonda CCC
003 Lou Reed DDD, FFF
I think I need to do something with selecting all N_IDs from tbl_Instructors, then returning the Course_IDs from tbl_Courses via concatenation, but that magic step has alluded me. Any help? Can I do this via SQL selects or will I need to use VB?
This is easy using Allen Browne's ConcatRelated() function. Copy the function from that web page and paste it into an Access standard code module.
Then this query will return what you asked for.
SELECT
i.N_ID,
i.F_Name,
i.L_Name,
ConcatRelated(
"Course_ID",
"tbl_Courses",
"N_ID = '" & [N_ID] & "'"
) AS Course_IDs
FROM tbl_Instructors AS i;
Consider changing the data type of N_ID from text to numeric in both tables. If you do that, you don't need the single quotes in the third argument to that ConcatRelated() expression.
"N_ID = " & [N_ID]
And whenever you need N_ID displayed with leading zeros, use a Format() expression.
Format(N_ID, "000")
This may be a strange question, but I'm wondering how users on sites like stackoverflow format the example resultsets when asking / answering questions? Is there a clean and easy way to create something like:
+----------+-------------------+
| Count(*) | MAX(created_date) |
+----------+-------------------+
| 234 | 10-may-14 |
| 847 | 03-Apr-14 |
+----------+-------------------+
It does a great job at representing everything in plain text, I'm just wondering if everyone takes the time to format that manually? Or is that an export option for some sort of database software? Something tells me I'm showing my inexperience here haha.
After much searching, I found what I was looking for! This simple tool lets you mock up your table/data in excel, then copy/paste in the default tab delimited format. It then converts it into the ASCII style table seen in my question. Worth a bookmark in my opinion.
http://www.sensefulsolutions.com/2010/10/format-text-as-table.html
Your example is:
+----------+-------------------+
| Count(*) | MAX(created_date) |
+----------+-------------------+
| 234 | 10-may-14 |
| 847 | 03-Apr-14 |
+----------+-------------------+
I think the following is a pretty desirable format:
Count(*) MAX(created_date)
234 10-may-14
847 03-Apr-14
(The only improvement is to represent the dates in ISO standard YYYY-MM-DD format. ;-)
To get this format, just prepend each row with four spaces. That creates the background shading and the fixed-width font. You can also omit the first four spaces and just type in what you want:
Count(*) MAX(created_date)
234 10-may-14
847 03-Apr-14
Oh. That looks ugly. Select the three lines and click on the {} just above the input box and the four spaces are added automagically.
Say I have an employee table, with a record for each employee in my company, and a column for supervisor (as seen below). I would like to prepare a report, which lists the names and title for each step in a supervision line. eg for dick robbins, 1d #15, i'd like a list of each supervisor in his "chain of command," all the way to the president, big cheese. I'd like to avoid using cursors, but if that's the only way to do this then that's ok.
id fname lname title supervisorid
1 big cheese president 1
2 jim william vice president 1
3 sally carr vice president 1
4 ryan allan senior manager 2
5 mike miller manager 4
6 bill bryan manager 4
7 cathy maddy foreman 5
8 sean johnson senior mechanic 7
9 andrew koll senior mechanic 7
10 sarah ryans mechanic 8
11 dana bond mechanic 9
12 chris mcall technician 10
13 hannah ryans technician 10
14 matthew miller technician 11
15 dick robbins technician 11
The real data probably won't be more than 10 levels deep...but I'd rather not just do 10 outside joins...I was hoping there was something better than that, and less involved than cursors.
Thanks for any help.
This is basically a port of the accepted answer on my question that I linked to in the OP comments.
you can use common-table expressions
WITH Family As
(
SELECT e.id, e.supervisorid, 0 as Depth
FROM Employee e
WHERE id = #SupervisorID
UNION All
SELECT e2.ID, e2.supervisorid, Depth + 1
FROM Employee e2
JOIN Family
On Family.id = e2.supervisorid
)
SELECT*
FROM Family
For more:
Recursive Queries Using Common Table Expressions
You might be interested in the "Materialized Path" solution, which does slightly de-normalize the table but can be used on any type of SQL database and prevents you from having to do recursive queries. In fact, it can even be used on no-SQL databases.
You just need to add a column which holds the entire ancestry of the object. For example, the table below includes a column named tree_path:
+----+-----------+----------+----------+
| id | value | parent | tree_path|
+----+-----------+----------+----------+
| 1 | Some Text | 0 | |
| 2 | Some Text | 0 | |
| 3 | Some Text | 2 | -2-|
| 4 | Some Text | 2 | -2-|
| 5 | Some Text | 3 | -2-3-|
| 6 | Some Text | 3 | -2-3-|
| 7 | Some Text | 1 | -1-|
+----+-----------+----------+----------+
Selecting all the descendants of the record with id=2 looks like this:
SELECT * FROM comment_table WHERE tree_path LIKE '-2-%' ORDER BY tree_path ASC
To build a tree, you can sort by tree_path to get an array that's fairly easy to convert to a tree.
You can also index tree_path and the index can be used when the wildcard is not at the beginning.
For example, tree_path LIKE '-2-%' can use the index, but tree_path LIKE '%-2-' cannot.
Some recursive function which either return the supervisor (if any) or null. Could be a SP which invokes itself as well, and using UNION.
SQL is a language for performing set operations and recursion is not one of them. Further, many database systems have limitations on recursion using stored procedures as a safety measure to prevent rogue code from running away with precious server resources.
So, when working with SQL always think 'flat', not 'hierarchical'. So I would highly recommend the 'tree_path' method that has been suggested. I have used the same approach and it works wonderfully and crucially, very robustly.
Background
To replace invalid zip codes.
Sample Data
Consider the following data set:
Typo | City | ST | Zip5
-------+------------+----+------
33967 | Fort Myers | FL | 33902
33967 | Fort Myers | FL | 33965
33967 | Fort Myers | FL | 33911
33967 | Fort Myers | FL | 33901
33967 | Fort Myers | FL | 33907
33967 | Fort Myers | FL | 33994
34115 |Marco Island| FL | 34145
34115 |Marco Island| FL | 34146
86405 | Kingman | FL | 86404
86405 | Kingman | FL | 86406
33967 closely matches 33965, although 33907 could also be correct. (In this case, 33967 is a valid zip code, but not in our zip code database.)
34115 closely matches is 34145 (off by one digit, with a difference of 3 for that digit).
86405 closely matches both.
Sometimes digits are simply reversed (e.g,. 89 instead of 98).
Question
How would you write a SQL statement that finds the "minimum distance" between multiple numbers that have the same number of digits, returning at most one result no matter what?
Ideas
Subtract the digits.
Use LIMIT 1.
Conditions
PostgreSQL 8.3
This sounds like a case for Levenshtein distance.
The Levenshtein distance between two
strings is defined as the minimum
number of edits needed to transform
one string into the other, with the
allowable edit operations being
insertion, deletion, or substitution
of a single character.
It looks like PostgreSQL has it built-in:
test=# SELECT levenshtein('GUMBO', 'GAMBOL');
levenshtein
-------------
2
(1 row)
http://www.postgresql.org/docs/8.3/static/fuzzystrmatch.html
Redfilter answered the question that was asked, but I just wanted to clarify that the requested solution will not resolve what appears to be the real problem.
The real problem here seems to be that you have a database which was hand keyed and some numbers were transcribed giving garbage data.
The ONLY way to solve this problem is to validate the full address against a database like the USPS, MapQuest, or another provider. I know the first two have API's available for doing this.
The example I gave in a comment above was to consider a zip of 75084 and a city value of Richardson. Richardson has zip codes in the range of 75080, 81, 82, 83, and 85. The minimum number of edits will be 1. However, which one?
Another equal problem is what if the entered zip code was 75083 for Richardson. Which is a valid zipcode for that city; however, what if the address resided in 75082?
The only way to get that is to have the full address validated.