I have a general question about star model in business intelligence project.
For example, let's suppose that the project is comprised of one fact table (F) and 3 dimensions (D1, D2, D3).
Furthermore, let's assume that fact table looks like this:
d11 d21 d21 m11 m21 m21
d12 d22 d22 m12 m22 m22
d13 d23 d23 m13 m23 m23
d14 d24 d24 m14 m24 m24
d15 d25 d25 m15 m25 m25
d16 d26 d26 m16 m26 m26
d17 d27 d27 m17 m27 m27
For example - d23 = dimension nr 2, value nr 3 in the dimension
(same is for measures)
Now, let's assume that a selection is made on every of 3 dimensions and that following parts of the fact table are selected:
d11 d21 d21 m11 m21
D12 d22 D22 m12 m22
D13 D23 D23 m13 m23
D14 D24 D24 m14 m24
d15 D25 D25 m15 m25
d16 d26 D26 m16 m26
d17 d27 d27 m17 m27
Now I would like to know which selections (marked with uppercase 'D') will/should/need to be in star model considered?
If OUTER JOIN principle is applied, then the following will be selected:
D12 d22 D22 m12 m22
D13 D23 D23 m13 m23
D14 D24 D24 m14 m24
d15 D25 D25 m15 m25
d16 d26 D26 m16 m26
ie for the selection in the first meassure following values will be considered (m12, m13, m14, m15, m16) and for second measure (m22, m23, m24, m25, m26).
On the other hand, if INNER JOIN is between fact table and dimension tables, result will be following selection:
D13 D23 D23 m13 m23
D14 D24 D24 m14 m24
ie for first measure following values would be considered (m13, m14) and for the second measure (m23, m24) in corresponding aggregate functions.
Which of the following approaches is taking place?
I cannot say that for all systems, but the standard way of selection is that you do selections on the dimension tables, which are then INNER JOINed to the fact table to filter the data, and INNER JOINed back to the dimensions where you have no selections but where columns are shown in the result.
Related
I have 2 tables with similar information. Let's call them DAILYROWDATA and SUMMARYDATA.
Table DAILYROWDATA
NIP NAME DEPARTMENT
A1 ARIA BB
A2 CHLOE BB
A3 RYAN BB
A4 STEVE BB
Table SUMMARYDATA
NIP NAME DEPARTMENT STATUSIN STATUSOUT
A1 ARIA BB 1/21/2020 8:06:23 AM 1/21/2020 8:07:53 AM
A2 CHLOE BB 1/21/2020 8:16:07 AM 1/21/2020 9:51:21 AM
A1 ARIA BB 1/22/2020 9:06:23 AM 1/22/2020 10:07:53 AM
A2 CHLOE BB 1/22/2020 9:16:07 AM 1/22/2020 10:51:21 AM
A3 RYAN BB 1/22/2020 8:15:03 AM 1/22/2020 9:12:03 AM
And I need to combine these two tables and show all data in table DAILYROWDATA and set the value if STATUSIN = NULL and STATUSOUT= Null then write 'NA'. This is the output that I meant:
NIP NAME DEPARTMENT STATUSIN STATUSOUT
A1 ARIA BB 1/21/2020 8:06:23 AM 1/21/2020 8:07:53 AM
A2 CHLOE BB 1/21/2020 8:16:07 AM 1/21/2020 9:51:21 AM
A3 RYAN BB NA NA
A4 STEVE BB NA NA
A1 ARIA BB 1/22/2020 9:06:23 AM 1/22/2020 10:07:53 AM
A2 CHLOE BB 1/22/2020 9:16:07 AM 1/22/2020 10:51:21 AM
A3 RYAN BB 1/22/2020 8:15:03 AM 1/22/2020 9:12:03 AM
A4 STEVE BB NA NA
I need to add some condition, so, i wanna set the value STATUSIN = NULL just when there is no NIP,NAME,DEPARTMENT,STATUSIN,STATUSOUT in one date.. so, that's can be multiple
You want a left join to bring the two tables together. The trickier part is that you need strings in order to represent the 'NA':
select drd.*,
coalesce(cast(statusin as varchar(255)), 'NA') as statusin,
coalesce(cast(statusout as varchar(255)), 'NA') as statusout
from DAILYROWDATA drd left join
SUMMARYDATA sd
on drd.nip = sd.nip;
route_number source_id latitude_value longitude_value no_of_stores
r1 676 28.15085 32.66055 23
r2 715 28.2160253 32.5214831 23
r3 345 28.2123115 32.537211 22
r4 150 28.23009 32.50323 23
r5 534 28.0949248 32.8075467 21
r6 1789 28.2204214 32.5035782 22
r7 647 28.21548 32.50238 23
r8 667 28.21132 32.51481 22
r9 2242 28.2389 32.5 19
r10 797 28.161657 32.8416816 20
r11 1097 28.1792849 32.8255522 19
r12 591 28.2513623 32.7638247 22
r13 1091 28.251208 32.7808329 21
r14 1267 28.2102213 32.8129836 21
r15 1016 28.1654648 32.8350845 19
r16 785 28.0786012 32.9513468 4
r17 1072 28.1701673 32.8382309 1
Mentioned above is a dataframe i am dealing with.
As you can see, the no. of stores in a route_number are different.
mean(no_of_stores) = 20 in this case
What i am looking for is,
depending on the geo-locations(latitude & longitude value) of my source_id , i want to combine multiple routes which lie closer to each other into 1 such that the no_of_stores in new group are equally divided.
The condition of routes lying closer to each other can be excluded, and just merge routes with lesser no. of stores into 1 can also be done.
i.e the routes which lie closer to each other( and no_of_stores are less than the mean(no_of_stores)), combine them into 1 big route, such that no_of_stores in the new routes formed is the mean of no_of_stores column, which in case is around 19.
Final output expected something like this: (not actual)
route_number new_route_no
r1 A1 #since its already has stores greater than mean
r2 A2
r3 A3
r4 A4
....................
r9 A9 #(19 stores)
r17 A9 #(1 stores) total 20
....................
r11 A11
r16 A11
r15 A15 #19 stores , since it cannot be combined further,keep as it is
I have tried using pandas groupby and aggregate methods, but couldnt find a way to transform this dataframe,
Any leads would be helpful.
CNum DNum RNum Quant Price
C100 D1 R10 2 8.99
C100 D1 R40 7 9.99
C200 D3 R10 4 16.99
C200 D3 R20 2 15.99
C200 D3 R30 2 17.99
C200 D3 R40 5 19.99
C200 D3 R50 6 18.99
C200 D3 R60 4 19.99
C200 D3 R70 8 15.99
C200 D5 R20 1 8.99
C300 D3 R10 2 16.99
C300 D4 R20 5 22.99
C400 D6 R30 3 4.99
C400 D6 R70 3 2.99
C500 D1 R40 1 9.99
C500 D2 R20 2 23.99
C500 D2 R40 1 24.99
C500 D3 R40 2 19.99
C500 D4 R40 8 23.99
C500 D5 R40 4 8.99
C500 D5 R50 5 8.99
C500 D5 R70 1 9.99
C500 D6 R20 2 1.99
C500 D6 R40 5 3.99
The table above is name Orders. The Query I'm trying to solve is stated as "For each dish ordered from a restaurant, get the dish number(DNum), the restaurant number(RNum), and the total quantity (for that dish ordered from that restaurant)." I can get the two numbers to populate, but am totally unsure of how to add up the quantities, anything I've tried just adds up the Quantities in total. Any ideas?
Here is one of the queries I tried. This actually returned an error:"Your query does not include the specified expression 'DNum' as part of an aggregate function.'
SELECT Ord1.DNum, Ord2.DNum, SUM(Ord1.Quant + Ord2.Quant) AS TotQuant
FROM Orders AS Ord1, Orders AS Ord2
WHERE (Ord1.RNum = Ord2.RNum)
another thats not working
SELECT Order1.DNum, Order2.DNum, TotQuant
FROM (SELECT SUM(Order1.Quant + Order2.Quant) AS TotQuant
FROM Orders AS Order1, Orders AS Order2
WHERE (Order1.RNum = Order2.RNum)
AND (Order1.DNum = Order2.DNum))
and one more
SELECT DISTINCT Ord1.DNum, SUM(Ord1.Quant + Ord2.Quant) AS TotQuant
FROM Orders AS Ord1, Orders AS Ord2
WHERE (Ord1.RNum = Ord2.RNum)
AND (Ord1.DNum = Ord2.DNum)
If my guess as to what you are trying to do is correct something like this should get you close:
SELECT DNum, RNum, SUM(Quant) AS TotalQuantity
FROM Orders
GROUP BY DNum, RNum
Ok so some quick comments on what you have tried:
Query 1
SELECT Ord1.DNum, Ord2.DNum, SUM(Ord1.Quant + Ord2.Quant) AS TotQuant
FROM Orders AS Ord1, Orders AS Ord2
WHERE (Ord1.RNum = Ord2.RNum);
This might seem like it should work, but if you think about it it's quite a meaningless query. You are selecting two identical DNum values and SUMming two identical Quant values. A human might be able to understand what you're asking the computer to do, but the computer is perplexed.
Query 2 and Query 3 will not work, primarily because they are similar to the initial query that returns and error. They are slightly different, but essentially you are asking for the wrong thing.
Now here's what you can try:
Introducing the GROUP BY method! Woohoo!
GROUP BY is perfect for this and many other queries! As stated on the w3schools page for it:
The GROUP BY statement is often used with aggregate functions (COUNT,
MAX, MIN, SUM, AVG) to group the result-set by one or more columns.
So a query like this:
SELECT Orders.DNum, Orders.RNum, sum(Orders.Quant) as OrderQuantity
FROM Orders
GROUP BY (Orders.DNum, Orders.RNum);
To deconstruct this a little bit:
This selects the columns you want to display, and the aggregate of the Orders.Quant column you want the sum of.
Then, you group by the DNum, which is then grouped by RNum to get you the sum you are looking for.
Hope it helps!
Instead of multiple If ... Then statements in Excel VBA, you can use the Select Case structure. But how does one perform this task efficiently if the case is a long list? For example, have a look the following data:
Code ID Girls Names
0001 Sophia
0002 Emma
0003 Olivia
0004 Isabella
0005 Ava
0006 Lily
0007 Zoe
0008 Chloe
0009 Mia
0010 Madison
0011 Emily
0012 Ella
0013 Madelyn
0014 Abigail
0015 Aubrey
0016 Addison
0017 Avery
0018 Layla
0019 Hailey
0020 Amelia
0021 Hannah
0022 Charlotte
0023 Kaitlyn
0024 Harper
0025 Kaylee
0026 Sophie
0027 Mackenzie
0028 Peyton
0029 Riley
0030 Grace
0031 Brooklyn
0032 Sarah
0033 Aaliyah
0034 Anna
0035 Arianna
0036 Ellie
0037 Natalie
0038 Isabelle
0039 Lillian
0040 Evelyn
0041 Elizabeth
0042 Lyla
0043 Lucy
0044 Claire
0045 Makayla
0046 Kylie
0047 Audrey
0048 Maya
0049 Leah
0050 Gabriella
0051 Annabelle
0052 Savannah
0053 Nora
0054 Reagan
0055 Scarlett
0056 Samantha
0057 Alyssa
0058 Allison
0059 Elena
0060 Stella
0061 Alexis
0062 Victoria
0063 Aria
0064 Molly
0065 Maria
0066 Bailey
0067 Sydney
0068 Bella
0069 Mila
0070 Taylor
0071 Kayla
0072 Eva
0073 Jasmine
0074 Gianna
0075 Alexandra
0076 Julia
0077 Eliana
0078 Kennedy
0079 Brianna
0080 Ruby
0081 Lauren
0082 Alice
0083 Violet
0084 Kendall
0085 Morgan
0086 Caroline
0087 Piper
0088 Brooke
0089 Elise
0090 Alexa
0091 Sienna
0092 Reese
0093 Clara
0094 Paige
0095 Kate
0096 Nevaeh
0097 Sadie
0098 Quinn
0099 Isla
0100 Eleanor
I put list of Code ID in column AA and list of Girls' Names in column AB. There's no way I will type the above list using the Select Case structure, so I use the following code to do the same task. It matches the partial text in column A and print the result in column E:
Sub Matching_ID()
.......................................
Dim ID As String, j As Integer, k As Integer, List As Integer
List = Cells(Rows.Count, "AA").End(xlUp).Row
ID = Mid(Cells(i, "A"), j, 4)
For k = List To 2 Step -1
If ID = Cells(k, "AA").Value Then
Cells(j, "E") = Cells(k, "AB").Value
Exit For
Else
Cells(j, "E") = ""
End If
Next k
.......................................
End Sub
Though the above code works fine, but it's really time-consuming. Is there a better way?
You can use VLOOKUP in VBA:
Sub Matching_ID()
Dim ID As String, j As Long, i As Long, k As Long, List As Range
Dim sht As Worksheet, v
Set sht = ActiveSheet
Set List = sht.Range(sht.Cells(2, "AA"), sht.Cells(Rows.Count, "AB").End(xlUp))
ID = Mid(Cells(i, "A"), j, 4)
'returns match or an error value if no match
v = Application.VLookup(ID, List, 2, False)
sht.Cells(j, "E") = IIf(IsError(v), "", v)
End Sub
I like using Match when searching a single column:
Dim t
'try to find ID
t = Application.Match(ID, Range("AA:AA"), 0)
'if not found t will be an error so we test that
If Not IsError(t) Then
Cells(i, "E") = Cells(t, "AB").Value
Else
Cells(i, "E") = ""
End If
I have the following SQL:
SELECT fldTitle
FROM tblTrafficAlerts
ORDER BY fldTitle
Which returns the results (from a NVARCHAR column) in the following order:
A1M northbound within J17 Congestion
M1 J19 southbound exit Congestion
M1 southbound between J2 and J1 Congestion
M23 northbound between J8 and J7 Congestion
M25 anti-clockwise between J13 and J12 Congestion
M25 clockwise between J8 and J9 Broken down vehicle
M3 eastbound at the Fleet services between J5 and J4A Congestion
M4 J19 westbound exit Congestion
You'll see the M23 and M25 are listed above the M3 and M4 rows, which doesn't look pleasing to the eye and if scanning a much longer list of results you'd not expect to read them in this order.
Therefore I would like the results sorted alphabetically, then numerically, to look like:
A1M northbound within J17 Congestion
M1 J19 southbound exit Congestion
M1 southbound between J2 and J1 Congestion
M3 eastbound at the Fleet services between J5 and J4A Congestion
M4 J19 westbound exit Congestion
M23 northbound between J8 and J7 Congestion
M25 anti-clockwise between J13 and J12 Congestion
M25 clockwise between J8 and J9 Broken down vehicle
So M3 and M4 appear above M23 and M25.
This should handle it. Also added some strange data to make sure the ordering also works on that:
SELECT x
FROM
(values
('A1M northbound within J17 Congestion'),
('M1 J19 southbound exit Congestion'),
('M1 southbound between J2 and J1 Congestion'),
('M23 northbound between J8 and J7 Congestion'),
('M25 anti-clockwise between J13 and J12 Congestion'),
('M25 clockwise between J8 and J9 Broken down vehicle'),
('M3 eastbound at the Fleet services between J5 and J4A Congestion'),
('M4 J19 westbound exit Congestion'),('x'), ('2'), ('x2')) x(x)
ORDER BY
LEFT(x, patindex('%_[0-9]%', x +'0')),
0 + STUFF(LEFT(x,
PATINDEX('%[0-9][^0-9]%', x + 'x1x')),1,
PATINDEX('%_[0-9]%', x + '0'),'')
Result:
2
A1M northbound within J17 Congestion
M1 J19 southbound exit Congestion
M1 southbound between J2 and J1 Congestion
M3 eastbound at the Fleet services between J5 and J4A Congestion
M4 J19 westbound exit Congestion
M23 northbound between J8 and J7 Congestion
M25 anti-clockwise between J13 and J12 Congestion
M25 clockwise between J8 and J9 Broken down vehicle
x
x2
Perhaps this is not beautiful, but it does work:
DECLARE #tblTrafficAlerts TABLE
(
fldTitle NVARCHAR(500)
);
INSERT INTO #tblTrafficAlerts (fldTitle)
VALUES (N'A1M northbound within J17 Congestion')
, (N'M1 J19 southbound exit Congestion')
, (N'M1 southbound between J2 and J1 Congestion')
, (N'M23 northbound between J8 and J7 Congestion')
, (N'M25 anti-clockwise between J13 and J12 Congestion')
, (N'M25 clockwise between J8 and J9 Broken down vehicle')
, (N'M3 eastbound at the Fleet services between J5 and J4A Congestion')
, (N'M4 J19 westbound exit Congestion');
SELECT *
FROM #tblTrafficAlerts AS T
CROSS APPLY (SELECT PATINDEX('%[0-9]%', T.fldTitle)) AS N(NumIndex)
CROSS APPLY (SELECT PATINDEX('%[0-9][^0-9]%', T.fldTitle)) AS NN(NextLetter)
ORDER BY SUBSTRING(T.fldTitle, 0, N.NumIndex), CONVERT(INT, SUBSTRING(T.fldTitle, N.NumIndex, NN.NextLetter - 1));
This will extract everything before first number, order by it, then extract that number and order by it as an integer.
That's output:
╔══════════════════════════════════════════════════════════════════╗
║ fldTitle ║
╠══════════════════════════════════════════════════════════════════╣
║ A1M northbound within J17 Congestion ║
║ M1 J19 southbound exit Congestion ║
║ M1 southbound between J2 and J1 Congestion ║
║ M3 eastbound at the Fleet services between J5 and J4A Congestion ║
║ M4 J19 westbound exit Congestion ║
║ M23 northbound between J8 and J7 Congestion ║
║ M25 anti-clockwise between J13 and J12 Congestion ║
║ M25 clockwise between J8 and J9 Broken down vehicle ║
╚══════════════════════════════════════════════════════════════════╝
SELECT fldTitle FROM tblTrafficAlerts order by LEFT(fldTitle , CHARINDEX(' ', fldTitle) - 1), fldTitle
or use patindex
ORDER BY LEFT(Col1,PATINDEX('%[^0-9]%',Col1)-1)
I'd go like this:
EDIT: I separated this in two portiions: The leading letter and the second part. This allows you to - if needed - treat the second part numerically (but there is a disturbing "M" in the first row...)
It would be easier to do just the second step: Cut at the first blank, check the length and add a '0' on sorting if needed.
DECLARE #tblTrafficAlerts TABLE(fldTitle VARCHAR(500));
INSERT INTO #tblTrafficAlerts VALUES
('A1M northbound within J17 Congestion')
,('M1 J19 southbound exit Congestion')
,('M1 southbound between J2 and J1 Congestion')
,('M23 northbound between J8 and J7 Congestion')
,('M25 anti-clockwise between J13 and J12 Congestion')
,('M25 clockwise between J8 and J9 Broken down vehicle')
,('M3 eastbound at the Fleet services between J5 and J4A Congestion')
,('M4 J19 westbound exit Congestion');
SELECT ta.fldTitle
,Leading.Letter
,Leading.SecondPart
FROM #tblTrafficAlerts AS ta
CROSS APPLY(SELECT SUBSTRING(ta.fldTitle,1,1) AS Letter
,SUBSTRING(ta.fldTitle,2,CHARINDEX(' ',ta.fldTitle)-1) AS SecondPart) AS Leading
ORDER BY Leading.Letter,CASE WHEN LEN(Leading.SecondPart)=1 THEN Leading.SecondPart + '0' ELSE Leading.SecondPart END
The result:
fldTitle Letter SecondPart
A1M northbound within J17 Congestion A 1M
M1 J19 southbound exit Congestion M 1
M1 southbound between J2 and J1 Congestion M 1
M23 northbound between J8 and J7 Congestion M 23
M25 anti-clockwise between J13 and J12 Congestion M 25
M25 clockwise between J8 and J9 Broken down vehicle M 25
M3 eastbound at the Fleet services between J5 and J4A Congestion M 3
M4 J19 westbound exit Congestion M 4