Quartile for subgroups in SQL Server 2008 - sql

I have a table with the times athletes of a sport club take to run a lap around the field . Each athlete has several entries in that table for each time they run and and for statistics purposed I need to gather some statistics regarding the time they take.
I already have the basic statistics like average time, median time, etc.... However I have no idea how to exactly do the bottom and top quartiles.
I see some examples for quartiles of a table if you just want the statistics of the whole table (in this case the whole club) but I have no idea how to make them for sub groups like distinct athletes of a table, could anyone give me point me on the right direction/give me an example?
The relevant data is in a very simple structure like this (there are more columns but in this case they don't matter)
LAP_ID | ATHLETE| TIME |
1 | Ath_X | 120 |
2 | Ath_Y | 160 |
3 | Ath_X | 90 |
4 | Ath_X | 80 |
5 | Ath_Z | 113 |
6 | Ath_X | 115 |
EDIT:There seems to be some misunderstanding, by Quartile I mean the 1st and 3rd Quartile, that is the place where it splits off the lowest 25% of data from the highest 75% and the place where it splits off the highest 25% of data from the lowest 75%.

Related

How to track day wise change in Power BI?

I am creating a Power BI report using data from https://www.mohfw.gov.in/ website which provides latest corona virus data for all Indian states/union territories.
Data is in below format -
+-----+-----------------------------+-----------+-------+-------+
| SNo | State | Confirmed | Cured | Death |
+-----+-----------------------------+-----------+-------+-------+
| 1 | Andaman and Nicobar Islands | 14 | 11 | 0 |
| 2 | Andhra Pradesh | 603 | 42 | 15 |
| 3 | Arunachal Pradesh | 1 | 0 | 0 |
| 4 | Assam | 35 | 12 | 1 |
| 5 | Bihar | 86 | 37 | 2 |
They website is refreshed with new data everyday, so there is no date wise tracker. I wanted to track the day wise change(increment/decrements) in cases for every state, is there any way I can model it in power BI to achieve this?
For now what I am doing is I am downloading the table from the web page everyday and adding a date column which will be today's date(getdate()) and loading the data into a SQL table. So everyday I am inserting a row for each of the state with that day's date-stamp in the table and then I can subtract it from previous day's data to see the changes, but I feel it is a inefficient way and the table size keep on increasing everyday.
So any suggestion to improve it, either by some changes in Power BI data model, or in SQL will be much appreciated.
Context
considering the data source is updated according to SCD 1 (Overwriting) the only way to track day wise change is to historize data every day. In practice, schedule a daily load of the data source and store the new data of that day.
Answer
You are implementing SCD 2 (Create a new record on change) in the correct way. It is important to make sure adding a technical field to each record with the timestamp when it was generated so you can study the trend later.
Extra
You could eventually optimize this approach by normalizing the model in order to reduce the size of the table you are applying SCD 2 (Create a new record on change).
Please let me give a simple example. Consider a table with:
only 1 record
1000 fields of which only 1 field (LAST_UPDATE) can change using SCD 2 (Create a new record on change)
If LAST_UPDATE changes 100,000 times a day, every days it triggers the creation of 100,000 new version of the same record (because we track its changes). Therefore, after one year the table would have still 1,000 fields and 36,500,000 records. Instead, if we normalize the model such that LAST_UPDATE field (historized with SCD 2) is stored in a separate table, after one year we would have one table with 1 record and 999 columns, and a different table with 1 column and 36,500,000 records.
In the case your database is a row database, you would much benefit from normalizing the model. Instead, if your database is columnar database, everything is already taken care of because each column is individually compressed instead of compressing row-wise.

How to join between table DurationDetails and Table cost per program

How to design database for tourism company to calculate cost of flight and hotel per every program tour based on date ?
what i do is
Table - program
+-----------+-------------+
| ProgramID | ProgramName |
+-----------+-------------+
| 1 | Alexia |
| 2 | Amon |
| 3 | Sfinx |
+-----------+-------------+
every program have more duration may be 8 days or 15 days only
it have two periods only 8 days or 15 days .
so that i do duration program table have one to many with program .
Table - ProgramDuration
+------------+-----------+---------------+
| DurationNo | programID | Duration |
+------------+-----------+---------------+
| 1 | 1 | 8 for Alexia |
| 2 | 1 | 15 for Alexia |
+------------+-----------+---------------+
And same thing to program amon program and sfinx program 8 and 15 .
every program 8 or 15 have fixed details for every day as following :
Table Duration Details
+------+--------+--------------------+-------------------+
| Days | Hotel | Flight | transfers |
+------+--------+--------------------+-------------------+
| Day1 | Hilton | amsterdam to luxor | airport to hotel |
| Day2 | Hilton | | AbuSimple musuem |
| Day3 | Hilton | | |
| Day4 | Hilton | | |
| Day5 | Hilton | Luxor to amsterdam | |
+------+--------+--------------------+-------------------+
every program determine starting by flight date so that
if flight date is 25/06/2017 for program alexia 8 days it will be as following
+------------+-------+--------+----------+
| Date | Hotel | Flight | Transfer |
+------------+-------+--------+----------+
| 25/06/2017 | 25 | 500 | 20 |
| 26/06/2017 | 25 | | 55 |
| 27/06/2017 | 25 | | |
| 28/06/2017 | 25 | | |
| 29/06/2017 | 25 | 500 | |
+------------+-------+--------+----------+
And this is actually what i need how to make relations ship to join costs with program .
for flight and hotel costs as above ?
for 5 days cost will be 1200
25 is cost per day for hotel Hilton
500 is cost for flight
20 and 55 is cost per transfers
image display what i need
relation between duration and cost
Truthfully, I don't fully understand exactly what you're trying to accomplish. Your description is not clear, your tables seem to be missing information / contain information that should not be in your tables, and the way that I'm understanding your description doesn't really make sense based on the UI screenshot that you shared.
It looks like you're working on an application for a travel agency which will allow agents to create an itinerary for a trip. They can give this trip a name (so if a particular package is a hit with customers, they can just offer the "Alexa" package), and the utility will calculate the total estimated cost of the trip. If I understand correctly, the trips will be either 8, or 15 days long.
Personally, I would delete the "ProgramDuration" table altogether. If there are two versions of the Alexa trip at index 1, then you're going to run into all manners of issues. I can get into the details of why this is a bad idea, but unless you're really hung up on having this ProgramDuration table, it's not worth the time. You should add a "duration" field to your "program" table, and assign a new ProgramID for each different duration version of the "Alexa" program.
Your table "Duration details" also misses the mark. Your fields in this table will make it harder to add new features to your application down the line. You should have a field "ProgramID," which we will use to join this table against the program table later. You should have a field "Day" which obviously indicates the day in the itinerary. You should have only one more field "ItemID." We're going to use the "ItemID" field to join your itinerary against a new items table we're going to create.
Your items table is where you define all of the items that can possibly appear in an itinerary. Your current itinerary table has three possible "types" of expenses, flights, hotels, and transfers. What if your travel agents want to start adding meal expenditures into their itineraries / budgets? What about activities that cost money? What about currency exchange fees? What about items that your clientele will need before their trip (wall adapters, luggage, etc.)? In your items table, you will have fields for an ItemID, ItemName, ItemUnitPrice, and ItemType. A possible item is as follows:
ItemID: 1, ItemName: Night At The Hilton, ItemUnitPrice: 300, ItemType: Lodging
Using the "SELECT [Column] AS [Alias]" syntax with some CTEs or subqueries and the JOIN operator, we can easily reconstitute a table that looks like your "Program Duration Details" table, but we will be afforded considerably more flexibility to add or remove things later down the line.
In the interests of security and programmability, I would also add a table called "ItemTypeTable" with a single field "TypeName." You can use this table to prevent unauthorized users from defining new item types, and you can use this table to create drop down menus, navigation, and all manners of other useful features. There might be cleaner implementations, but this shouldn't represent a serious performance or size hit.
All in all, at the risk of being somewhat rude, it seems like you're trying to take on a rather large, sophisticated task with a very rudimentary understanding of basic relational database design and implementation. If you are doing this in a professional context, I would strongly encourage you to consider consulting with another professional that may be more experienced in this area.

Subtract two aggregated values in Bar Chart

My data is like -
+-----------+------------------+-----------------+-------------+
| Issue Num | Created On | Closed at | Issue Owner |
+-----------+------------------+-----------------+-------------+
| 1 | 12/21/2016 15:26 | 1/13/2017 9:48 | Name 1 |
| 2 | 1/10/2017 7:38 | 1/13/2017 9:08 | Name 2 |
| 3 | 1/13/2017 8:57 | 1/13/2017 8:58 | Name 2 |
| 4 | 12/20/2016 20:30 | 1/13/2017 5:46 | Name 2 |
| 5 | 12/21/2016 19:30 | 1/13/2017 1:14 | Name 1 |
| 6 | 12/20/2016 20:30 | 1/12/2017 9:11 | Name 1 |
| 7 | 1/9/2017 17:44 | 1/12/2017 1:52 | Name 1 |
| 8 | 12/21/2016 19:36 | 1/11/2017 16:59 | Name 1 |
| 9 | 12/20/2016 19:54 | 1/11/2017 15:45 | Name 1 |
+-----------+------------------+-----------------+-------------+
What I am trying to achieve is
Number of issues created per week
Number of issues closed per week
Net number of issues remaining per week
I am able to resolve the top two points but unable to approach the last.
My attempt -
This gives me number of issues created every week.
Similarly I have done for Closed per week.
For Net number of issues (Created-Closed) -
I tried adding Closed At column along with Created On but I can't see second bar in the chart along with Created On either.
Something like this
I tried doing the same in excel -
I want something of this sort but with another column as the difference of
number of issues created that week - number of issues closed that week.
In this case, 8-6=2.
You could use a calculated field(Analysis->Create Calculated Field). Something like this:
{FIXED [Create Date]:Count(if DATEPART('year',[Create Date]) = 2016 then [Number of Records] end)} - {FIXED [Closed Date]:Count(if DATEPART('year',[Closed Date]) = 2016 then [Number of Records] end)}
This function is using LOD expressions to pull back both sets of values. It will filter on all 2016 results for both date sets and then minus them from each other.
For more on LOD's see here:
https://www.tableau.com/about/blog/LOD-expressions
Use this as your measure and pull in one of your date fields as the dimension.
The normal way to solve this problem is to reshape the data so you have one row per status change instead of one row per issue, with a column named [Date] and a column named [Action]. The action can be submit and close (or in a more complex world include approve, reject, whatever - tracking the history.
You can do the reshaping without modifying your source data by using a UNION to get two copies of each row with appropriate calculated fields to make the visible columns make sense (e.g., create calculated a field called Date that returns the submission date or closing date depending on whether the row is from the first or second union, with a similar one called Action whose value depends on that as well. Filter out Close actions that have a null date)
Or you can preprocess the data to reshape it.
Or you can use data blending to make two sources that point to the same data source but customizing the linking fields to line up the submit and close dates (e.g., duplicate the data connection and rename both date fields to have the same name). But in this case, you probably want to create scaffolding source that has every date, but no other data, to use as the primary data source to avoid filtering out data from the secondary for dates that don't appear in the primary. The blending approach can be brittle.
Assuming you used the UNION approach instead of Data Blending, then you can count the number of submissions and closures within a certain date range, or compute a running total of the difference to see the backlog size over time.

How do you read two-way tables?

I want to know what is two-way tables in SQL?
And how can i read these two-way tables
Two-way tables is no way of storing data, but of displaying data. It doesn't say anything about how the data is stored.
Let's say we store persons along with their IQ and the country they live in. The table may look like this:
name iq country
John Smith 125 GB
Mary Jones 150 GB
Juan Lopez 150 ES
Liz Allen 125 GB
The two-way table to show the relation between IQ and country would be:
| 125 | 150
---+------+----
GB | 2 | 1
ES | 0 | 1
or
| GB | ES
----+-----+---
125 | 2 | 0
150 | 1 | 1
In order to retrieve this data from the database you might write this query:
select iq, country, count(*)
from persons
group by iq, country;
SQL is meant to retrieve data; it is not really meant to care about it's presentation, the layout. So you'd write a program (in PHP, C#, Java, whatever) sending the query to the database, receiving the data and filling a GUI grid in a loop.
In rare cases SQL can provide the layout itself, i.e. give you the data in columns and rows. This is when the attributes of one dimensions are known beforehand. This is usually not the case with IQs or countries as in the example given (i.e. you wouldn't have known which countries and which IQs are present in the table, if I hadn't shown you). But of course you could have retrieved either the countries or the IQs first and then build the main query dynamically (with conditional aggregation or pivot). Another case when values are known beforehand is booleans, e.g. a flag in the persons table to show whether a person is homeless. If you wanted results for how many homeless persons in which countries, you could easily write a query with two columns for homeless and not homeless.
As mentioned: that you can display data in a two-way table doesn't say anything about how this data is stored. Above I showed you a one table example. But let's say you have stores in many cities and want to know in which cities live thinner or thicker people. You decide to check which t-shirt sizes you sold in which cities. So you take your clients orders, look up the clients and the cities they live in. You also look up the order details and the items they refer to, then take all items of type t-shirt. There are many tables involved, but the result is again a two-sided table showing the relation of two attributes. E.g:
city | S | M | L | XL
------------+-----+-----+-----+-----
New York | 5% | 8% | 7% | 10%
Los Angeles | 10% | 7% | 7% | 8%
Chicago | 1% | 4% | 6% | 11%
Houston | 2% | 2% | 5% | 7%

Access Query: get difference of dates with a twist

I'm going to do my best to explain this so I apologize in advance if my explanation is a little awkward. If I am foggy somewhere, please tell me what would help you out.
I have a table filled with circuits and dates. Each circuit gets trimmed on a time cycle of about 36 months or 48 months. I have a column that gives me this info. I have one record for every time the a circuit's trim cycle has been completed. I am attempting to link a known circuit outage list, to a table with their outage data, to a table with the circuit's trim history. The twist is the following:
I only want to get back circuits that have exceeded their trim cycles by 6 months. So I would need to take all records for a circuit, look at each individual record, find the most recent previous record relative to the record currently being examined (I will need every record examined invididually), calculate the difference between the two records in months, then return only the records that exceeded 6 months of difference between any two entries for a given feeder.
Here is an example of the data:
+----+--------+----------+-------+
| ID | feeder | comp | cycle |
| 1 | 123456 | 1/1/2001 | 36 |
| 2 | 123456 | 1/1/2004 | 36 |
| 3 | 123456 | 7/1/2007 | 36 |
| 4 | 123456 | 3/1/2011 | 36 |
| 5 | 123456 | 1/1/2014 | 36 |
+----+--------+----------+-------+
Here is an example of the result set I would want (please note: cycle can vary by circuit, so the value in the cycle column needs to be in the calculation to determine if I exceeded the cycle by 6 months between trimmings):
+----+--------+----------+-------+
| ID | feeder | comp | cycle |
| 3 | 123456 | 7/1/2007 | 36 |
| 4 | 123456 | 3/1/2011 | 36 |
+----+--------+----------+-------+
This is the query I started but I'm failing really hard at determining how to make the date calculations correctly:
SELECT temp_feederList.Feeder, Temp_outagesInfo.causeType, Temp_outagesInfo.StormNameThunder, Temp_outagesInfo.deviceGroup, Temp_outagesInfo.beginTime, tbl_Trim_History.COMP, tbl_Trim_History.CYCLE
FROM (temp_feederList
LEFT JOIN Temp_outagesInfo ON temp_feederList.Feeder = Temp_outagesInfo.Feeder)
LEFT JOIN tbl_Trim_History ON Temp_outagesInfo.Feeder = tbl_Trim_History.CIRCUIT_ID;
I wasn't really able to figure out where I need to go from here to get that most recent entry and perform the mathematical comparison. I've never been asked to do SQL this complex before, so I want to thank all of you for your patience and any assistance you're willing to lend.
I'm making some assumptions, but this uses a subquery to give you rows in the feeder list where the previous completed date was greater than the number of months ago indicated by the cycle:
SELECT tbl_Trim_History.ID, tbl_Trim_History.feeder,
tbl_Trim_History.comp, tbl_Trim_History.cycle
FROM tbl_Trim_History
WHERE tbl_Trim_History.comp>
(SELECT Max(DateAdd("m", tbl_Trim_History.cycle, comp))
FROM tbl_Trim_History T2
WHERE T2.feeder = tbl_Trim_History.feeder AND
T2.comp < tbl_Trim_History.comp)
If you needed to check for longer than 36 months you could add an arbitrary value to the months calculated by the DateAdd function.
Also I don't know if the value of cycle specified the number of month from the prior cycle or the number of months to the next one. If the latter I would change tbl_Trim_History.cycle in the DateAdd function to just cycle.
SELECT tbl_trim_history.ID, tbl_trim_history.Feeder,
tbl_trim_history.Comp, tbl_trim_history.Cycle,
(select max(comp) from tbl_trim_history T
where T.feeder=tbl_trim_history.feeder and
t.comp<tbl_trim_history.comp) AS PriorComp,
IIf(DateDiff("m",[priorcomp],[comp])>36,"x") AS [Select]
FROM tbl_trim_history;
This query identifies (with an X in the last column) the records from tbl_trim_history that exceed the cycle time - but as noted in the comments I'm not entirely sure if this is what you need or not, or how to incorporate the other 2 tables. Once you see what it is doing you can modify it to only keep the records you need.