Representing resultsets in text "tables" - sql

This may be a strange question, but I'm wondering how users on sites like stackoverflow format the example resultsets when asking / answering questions? Is there a clean and easy way to create something like:
+----------+-------------------+
| Count(*) | MAX(created_date) |
+----------+-------------------+
| 234 | 10-may-14 |
| 847 | 03-Apr-14 |
+----------+-------------------+
It does a great job at representing everything in plain text, I'm just wondering if everyone takes the time to format that manually? Or is that an export option for some sort of database software? Something tells me I'm showing my inexperience here haha.

After much searching, I found what I was looking for! This simple tool lets you mock up your table/data in excel, then copy/paste in the default tab delimited format. It then converts it into the ASCII style table seen in my question. Worth a bookmark in my opinion.
http://www.sensefulsolutions.com/2010/10/format-text-as-table.html

Your example is:
+----------+-------------------+
| Count(*) | MAX(created_date) |
+----------+-------------------+
| 234 | 10-may-14 |
| 847 | 03-Apr-14 |
+----------+-------------------+
I think the following is a pretty desirable format:
Count(*) MAX(created_date)
234 10-may-14
847 03-Apr-14
(The only improvement is to represent the dates in ISO standard YYYY-MM-DD format. ;-)
To get this format, just prepend each row with four spaces. That creates the background shading and the fixed-width font. You can also omit the first four spaces and just type in what you want:
Count(*) MAX(created_date)
234 10-may-14
847 03-Apr-14
Oh. That looks ugly. Select the three lines and click on the {} just above the input box and the four spaces are added automagically.

Related

PowerBI Report or SQL Query Grouping Data Spanning Columns

I'm wracking my brain trying to figure this out. I have a dataset / table that looks like this:
ID | Person1 | Person2 | Person3 | EffortPerPerson
01 | Bob | Ann | Frank | 2
02 | Frank | Bob | Joe | 3
03 | Ann | Joe | Beth | 1
I'm trying add up "Effort" for each person. For example, Bob is 2+3, Joe is 3+1, etc. My goal is to produce a PowerBI scatter plot showing total Effort for each person.
In a perfect world, the query shouldn't care how many "Person" fields there are. It should just count up the Effort value for every row that the individual's name appears.
I thought GROUP BY would work, but obviously that's only for one column, and I can't wrap my head around how to make nested queries work here.
Any one have any ideas? Thanks in advance!
As Nick suggested, you should go with the Unpivot transformation. Go to Edit Queries and select Transform tab:
Select columns you want to transform in rows, open dropdown menu under Unpivot Columns and select "Unpivot Only Selected Columns":
And that's it! Power BI will aggregate values for you:

Excel Concatenate Cells while adding characters and skipping Empty cells

I am not a programmer, but doing Excel work for a small library. We have these fields in an excel sheet:
John | J | Smith | BMI | 123 | 100 |
Sarah | P | Crown | ASCAP | 564 | 100 |
Tommy | T | Stew | BMI | 134 | 100 |
Suzy | S | Smith | BMI | 678 | 50 |
John | J | Smith | BMI | 123 | 50
What I would like to be able to combine any of the cells (in the same row)into one cell that would read like this:
John J Smith, (BMI), 100%, IPI 123
or
Suzy S Smith, (BMI), 50%, IPI 678 | John J Smith (BMI), 50% IPI 123
I figured out how to use the Concatenate function to do this, but it doesn't skip empty cells, and I get extra "|" or "()" in those spots. I also found the =StringConCat topic, and that works great for skipping, but I can't figure out how to add the extra characters.
Any help would be most appreciated.
Thank you!!
EDIT: Thanks for the quick responses so far. I should be more clear -
the pipes in my example were only to designate different cells - they are not actual characters in the cells (thanks for converting it to a table for me, Bruce). The only Pipe character I would like to use is in the results, as in my example between Suzy and John.
There will rarely be more than 2 entries on the same result line, but it is possible. Mostly it will be to composers that are sharing the credit. But there is a chance that they will work on a Public Domain song and I have to list "Traditional" or maybe "Mozart" as another composer.
Sorry that I don't know enough to ask my question as intelligently as I should. Just learning how to do this, and trying to figure it out as I go.
Thanks again!
For the extra spaces, use substitute to get rid of empties.
So, if your code is =concatenate(A1,B1,C1) and your 'empty spaces' are "| " then edit your formula to become =substitute(concatenate(A1,B1,C1),"| ","")
You can even stack the substitutes to add more possible 'empties', like " " (two spaces) or the like. =substitute(substitute(concatenate(A1,B1,C1),"| ","")," ","")

Why am I getting different results via LINQ to Entities that via run of SQL generated by the same query?

I'm working on a school project that was started by another group last semester. This semester I'm on a team that is tasked with completing this project. There are ZERO common people between the groups .... my team is a completely new team attempting to finish another teams project with little to no documentation.
Anyway, with that background out of the way, I am having an issue with the project. My entity framework seems to not like the views I have created. It may also be worth mentioning that when creating this view, it is a complex view and was created by joining about 6-7 tables
As an arbitrary test (i dont really need answers that have "what" in them), I have executed this query in SQL Management Studio
SELECT *
FROM [dbo].[Course_Answers_Report] -- Course_Answers_Report is a View
WHERE question like '%what%'
Which produces the following output:
survey_setup_id | course_number | crn_number | term_offered | course_title | Instructor_Name | question_type_id | question | answer
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2617 | 107013 | 5001 | 201505 | Advanced Microsoft Access | -output ommited- | 2 | I understood what the teacher was saying. | A
2617 | 107013 | 5001 | 201505 | Advanced Microsoft Access | -output ommited- | 2 | I can apply what I learned in this class. | A
2617 | 107013 | 5001 | 201505 | Advanced Microsoft Access | -output ommited- | 2 | I understood what was expected of me in this course. | A
Now in Visual Studio i have this small bit of code (as a small side note this is in MVC, however the issue doesn't lie in MVC, but rather somewhere in the LINQ, Entity, or Controller.....this has been decided by doing some debugging).
public ActionResult modelTest()
{
using (SurveyEntities context = new SurveyEntities())
{
context.Database.Log = s => System.Diagnostics.Debug.WriteLine(s);
var questions = context
.Course_Answers_Report
.Where(r => r.question.Contains("what"))
.ToList();
ViewBag.Questions = questions;
}
}
This outputs the following table on the View (again, the problem is decidedly not in the View because when debugging, the var that holds the List has all incorrect data)
survey_setup_id | course_number | crn_number | term_offered | course_title | Instructor_Name | question_type_id | question | answer
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2617 | 107013 | 5001 | 201505 | Advanced Microsoft Access | -output ommited- | 2 | I understood what the teacher was saying. | A
2617 | 107013 | 5001 | 201505 | Advanced Microsoft Access | -output ommited- | 2 | I understood what the teacher was saying | A
2617 | 107013 | 5001 | 201505 | Advanced Microsoft Access | -output ommited- | 2 | I understood what the teacher was saying. | A
As you can see, this output is incorrect as the question (or rather the record) never changes when it should be
The SQL generated by this linq statement is
SELECT
[Extent1].[survey_setup_id] AS [survey_setup_id],
[Extent1].[course_number] AS [course_number],
[Extent1].[crn_number] AS [crn_number],
[Extent1].[term_offered] AS [term_offered],
[Extent1].[course_title] AS [course_title],
[Extent1].[Instructor_Name] AS [Instructor_Name],
[Extent1].[question_type_id] AS [question_type_id],
[Extent1].[question] AS [question],
[Extent1].[answer] AS [answer]
FROM (SELECT
[Course_Answers_Report].[survey_setup_id] AS [survey_setup_id],
[Course_Answers_Report].[course_number] AS [course_number],
[Course_Answers_Report].[crn_number] AS [crn_number],
[Course_Answers_Report].[term_offered] AS [term_offered],
[Course_Answers_Report].[course_title] AS [course_title],
[Course_Answers_Report].[Instructor_Name] AS [Instructor_Name],
[Course_Answers_Report].[question_type_id] AS [question_type_id],
[Course_Answers_Report].[question] AS [question],
[Course_Answers_Report].[answer] AS [answer]
FROM [dbo].[Course_Answers_Report] AS [Course_Answers_Report]) AS [Extent1]
WHERE [Extent1].[question] LIKE N'%what%'
When this SQL is run inside SQL management studio, it produces proper results. I am at a loss as to why EF is behaving this way, can anyone offer insight
EDIT: Per request of Danny Varod, the EDMX can be found here http://pastebin.com/dUf6J4fV and the View can be found here http://pastebin.com/sCsqNYWc (the view is kind of ugly/sloppy as it was just supposed to be a test and experiment)
Your problem is visible in the edmx file;
warning 6002: The table/view 'wctcsurvey.dbo.Course_Answers_Report' does not have a primary key defined. The key has been inferred and the definition was created as a read-only table/view.
<EntityType Name="Course_Answers_Report">
<Key>
<PropertyRef Name="survey_setup_id" />
</Key>
You have not defined a primary key in the table, so one has been "guessed". Since the guessed column survey_setup_id is not unique in the table (all 3 rows in the correct result have the same value), EF will get confused and fetch the same object 3 times (it has the same guessed primary key after all).
If you add a correct primary key annotation to your model (ie a unique field), the problem will disappear.

What to do when missing some data in a date series?

I am trying to graph a count over time from multiple sources, but having issues when the collection job fails on one (or more, but not all) of the sources.
Suppose I have a set of data like:
date | count
---------------------
10-11-2013 | 50
11-11-2013 | 52
13-11-2013 | 63
and another like
date | count
---------------------
10-11-2013 | 15
11-11-2013 | 19
12-11-2013 | 17
13-11-2013 | 20
for whatever reason I am missing the data entry on the 12th for the first one. If I am just working with this single object then I can graph it fine by just skipping that element and the line will just be inaccurate on that day.
The problem I get is when I have multiple sources, and at least one of them succeeded in reporting its results for that day. I have a queryset that gets a sum of the all the daily counts:
DailyCount.objects.values('date').annotate(count=Sum('count')).order_by('date')
The results from this show a much lower number on the entry for the 12th. Making the graph look very wrong whenever this happens.
date | count
---------------------
10-11-2013 | 65
11-11-2013 | 71
12-11-2013 | 17
13-11-2013 | 83
Is there a way to have my queryset use the previous date's count if it doesn't exist? I thought about adding the previous day's count to the database, but it doesn't seem right to be adding some (probably wrong) data to the database when I can't verify it.
ideally I think it would look like:
date | count
---------------------
10-11-2013 | 65
11-11-2013 | 71
12-11-2013 | 69
13-11-2013 | 83
It depends on how you display the graph. In pandas you can store time series of data as well and they provide exactly the functionality you describe: backfill or forward fill any missing values by using a previous or future value (i.e., pandas.DataFrame.fillna). On one hand using that library for just that functionality is overkill but you may find it useful if you're planning on doing more data manipulation.
I don't think a Django QuerySet can fill in missing values as it was not built to do that. However you could compute it manually by taking the values from the query result and computing the right daily values before displaying the graph.

Match similar zip codes

Background
To replace invalid zip codes.
Sample Data
Consider the following data set:
Typo | City | ST | Zip5
-------+------------+----+------
33967 | Fort Myers | FL | 33902
33967 | Fort Myers | FL | 33965
33967 | Fort Myers | FL | 33911
33967 | Fort Myers | FL | 33901
33967 | Fort Myers | FL | 33907
33967 | Fort Myers | FL | 33994
34115 |Marco Island| FL | 34145
34115 |Marco Island| FL | 34146
86405 | Kingman | FL | 86404
86405 | Kingman | FL | 86406
33967 closely matches 33965, although 33907 could also be correct. (In this case, 33967 is a valid zip code, but not in our zip code database.)
34115 closely matches is 34145 (off by one digit, with a difference of 3 for that digit).
86405 closely matches both.
Sometimes digits are simply reversed (e.g,. 89 instead of 98).
Question
How would you write a SQL statement that finds the "minimum distance" between multiple numbers that have the same number of digits, returning at most one result no matter what?
Ideas
Subtract the digits.
Use LIMIT 1.
Conditions
PostgreSQL 8.3
This sounds like a case for Levenshtein distance.
The Levenshtein distance between two
strings is defined as the minimum
number of edits needed to transform
one string into the other, with the
allowable edit operations being
insertion, deletion, or substitution
of a single character.
It looks like PostgreSQL has it built-in:
test=# SELECT levenshtein('GUMBO', 'GAMBOL');
levenshtein
-------------
2
(1 row)
http://www.postgresql.org/docs/8.3/static/fuzzystrmatch.html
Redfilter answered the question that was asked, but I just wanted to clarify that the requested solution will not resolve what appears to be the real problem.
The real problem here seems to be that you have a database which was hand keyed and some numbers were transcribed giving garbage data.
The ONLY way to solve this problem is to validate the full address against a database like the USPS, MapQuest, or another provider. I know the first two have API's available for doing this.
The example I gave in a comment above was to consider a zip of 75084 and a city value of Richardson. Richardson has zip codes in the range of 75080, 81, 82, 83, and 85. The minimum number of edits will be 1. However, which one?
Another equal problem is what if the entered zip code was 75083 for Richardson. Which is a valid zipcode for that city; however, what if the address resided in 75082?
The only way to get that is to have the full address validated.