PowerBI - Trying to sort a one-to-many column by the many column in visualizations where it would always be one-to-one - one-to-many

I'm working with bus data where each record in the raw data records a bus reaching a stop and how many people got on or off the bus. The raw data also includes which route the bus is for every record, and by creating an ID of [bus route] + [bus stop], I can then reference a manually maintained stop order table so that the order for the stop in the context of the route is available for sorting. E.g order for stop100.route5 = 4; order for stop100.route6 = 8 , etc.
Example of the same stop having a different order here:
The separate table for stop order I mentioned is set up like this (filtered to show different values for the same stop):
Now that I'm trying out PowerBI I'm hitting a bit of a roadblock. I can't sort the stop column by the stop order column, as there are multiple values for each stop depending on the route in question. I know that I can still use stop order and stop as row values and toggle the 'expand all' setting, but my ideal is to hide the stop order numbers, plus in drill-down situations to stop level, the stops will be sorted alphabetically rather than by order number.
For any experts I have a few avenues I thought might be viable workarounds with enough know-how:
Is there a way to hide portions of a field's values in visuals? This whole thing wouldn't be an issue if I could use the stop.route IDs in place of the stop name field, but I would want to hide the .route portion of the value.
Is there any long-winded way to create a one-to-one sorting that I can use to sort the stop column? Some sort of dynamic calculation that filters a one-to-many to a one-to-one, as every context I plan on using this, there will be only one possible order # for the stop.
Many thank yous to anyone with advice!

Related

Reducing database load from consecutive queries

I have an application which calls the database multiple times to achieve one simple goal.
A little information about this application; In short, the application scrapes data from a webpage & stores specific information from this page into a database. The important information in this query is: Player name, Position. There can be multiple sitting at one specific position, kill points & Class
Player name has every potential to change or remain the same every day
Regarding the Position, there can be multiple sitting in one position
Kill points has the potential to increase or remain the same every day
Class, there is only 2 possibilities that a name can be, Ex: A can change to B or remain A (same in reverse), but cannot be C,D,E,F
The player name can change at any particular day, Position can also change dependent on the kill point increase from the last update which spins back around to the goal. This is to search the database day by day, from the current date to as far back as 2021-02-22 starting at the most recent entry for a player name and back track to the previous day to check if that player name is still the same or has changed.
What is being used as a main reference to the change is the kill points. As the days go on, this number will either be the exact same or increase, it can never decrease.
So now onto the implementation of this application.
The first query which runs finds the most recent entry for the player name
SELECT TOP(1) * FROM [changes] WHERE [CharacterName]=#charname AND [Territory]=#territory AND [Archived]=0 ORDER BY [Recorded] DESC
Then continue to check the previous days entries with the following query:
SELECT TOP(1) * FROM [changes] WHERE [Territory]=#territory AND [CharacterName]=#charname AND [Recorded]=#searchdate AND ([Class] LIKE '%{Class}%' OR [Class] LIKE '%{GetOpposite(Class)}%' AND [Archived]=0 )
If no results are found, will then proceed to find an alternative name with the following query:
SELECT TOP(5) * FROM [changes] WHERE [Kills] <= #kills AND [Recorded]='{Data.Recorded.AddDays(-1):yyyy-MM-dd}' AND [Territory]=#territory AND [Mode]=#mode AND ([Class] LIKE #original OR [Class] LIKE #opposite) AND [Archived]=0 ORDER BY [Kills] DESC
The aim of the query above is to get the top 5 entries that are the closest possible matches & Then cross references with the day ahead
SELECT COUNT(*) FROM [changes] WHERE [CharacterName]=#CharacterName AND [Territory]=#Territory AND [Recorded]=#SearchedDate AND [Archived]=0
So with checking the day ahead, if the character name is not found in the day ahead, then this is considered to be the old player name for this specific character, else after searching all 5 of the results and they are all found to be present in the day aheads searches, then this name is considered to be new to the table.
Now with the date this application started to run up to today's date which is over 400 individual queries on the database to achieve one goal.
It is also worth a noting that this table grows by 14,400 - 14,500 Rows each and every day.
The overall question to this specific? Is it possible to bring all these queries into less calls onto the database, reduce queries & improve performance?
What you can do to improve performance will be based on what parts of the application stack you can manipulate. Things to try:
Store Less Data - Database content retrieval speed is largely based on how well the database is ordered/normalized and just how much data needs to be searched for each query. Managing a cache of prior scraped pages and only storing data when there's been a change between the current scrape and the last one would guarantee less redundant requests to the db.
Separate specific classes of data - Separating data into dedicated tables would allow you to query a specific table for a specific character, etc... effectively removing one where clause.
Reduce time between queries - Less incoming concurrent requests means less resource contention and faster response times to prior requests.
Use another data structure - The only reason you're using top() is because you need data ordered in some specific way (most-recent, etc...). If you just used a code data structure that keeps the data ordered and still easily-query-able you could then perhaps offload some sql requests to this structure instead of the db.
The suggestions above are not exhaustive, but what you do to improve performance is largely a function of what in the application stack you have the ability to modify.

How to set the explicit order for child table rows for one-to-many SQL relation?

Imagine a database with two tables, lists (with id and name) and items (with id, list_id, which is a foreign key linking to lists.id, and name) and the application with ORM and the corresponding models.
A task: have a way in the application to create/edit/view the list and the items inside it (that should be pretty easy), but also saving the order of the items within one list and allowing to reorder the items within one list (so, a user creates the items list, then swaps two items, then when displaying the list, the items order should be preserved) or deleting items.
What is the best way to implement it, database-wise? Which db structure should I use for it?
I see these ways of solving it:
not using the external table for items, but storing everything in a list document (as a postgres jsonb column for example) - can work but I suppose that's not RDBMS way to do it and if the user would want to update the single item, the whole list object would need to be updated
having a position field in items table and adding a way to manage the position in the API - can work, but it's quite complicated (like, handling the cases where the position is the same for some items, handling swapping items, handling items deletions and having to decrease the position of all the items coming after the deleted one etc.)
Is there a simple way of implementing it? Like the one used in production by some big companies? I'm really curious about how such cases are handled in real life.
This is more theoretical question, so no code samples here (except for the db structure).
This is a good question, which as far as I know doesn't have any simple answers. I once came up with a solution for a high volume photo sharing site using an item table with columns list_id and position as you describe. The key to performance was to minimize renumbering as this database had millions of photos (and more than 2^32 likes).
The only operation was to move a single item to another point in the list (before or after another item in the list). This would work by first assigning positions with large steps, e.g. 1000, 2000, 3000. Whenever an item is moved between two others the average is used, e.g. move from pos=3000 to 1500. Eventually you can try to move an item between two items that have consecutive position numbers. Then you choose to renumber items either above or below depending on which way requires fewer updates (e.g. if there were a run of consecutive positions). This was done using RANK and #vars as I recall on MySQL 5.7.
This did work well resolving a problem where there was intermittent unavailability in production due to massive renumberings that were occurring before when consecutive positions were used.
I was able to dig up a couple of the queries (that was meant to go into a blog post ages ago). Turns out this was MySQL before RANK() was a thing which is why the #shuffle_rank variable was used. The + 0 (and the + 1) is because this is the actual SQL sent to the query but it was generated in code. This is to find the first gap below (greater than) position 120533287:
SELECT shuffle_rank, position
FROM (SELECT #shuffle_rank := #shuffle_rank + 1 AS shuffle_rank, position
FROM `gallery_items`
JOIN (SELECT #shuffle_rank := 0) initialize_rank_var
WHERE `gallery_items`.`gallery_id` = 14103882 AND (position >= 120533287)
ORDER BY position ASC) positionable_items
WHERE ABS(120533287 - position) >= shuffle_rank + 0 LIMIT 1
Here's the update query after the above query and supporting code decided that 3 rows need to be updated to make a gap. The + 1 here may be larger if renumbering with some gap if there's room.
UPDATE `gallery_items`
SET position = -222 + (#shuffle_rank := #shuffle_rank + 1)
WHERE `gallery_items`.`gallery_id` = 24669422
AND (position >= -222)
AND ((SELECT #shuffle_rank := 0) = 0)
ORDER BY position ASC
LIMIT 3
Note that this pair of actual queries aren't for the same operation seeing as they have different gallery_id values (aka list_id).

How to handle reoccurring calendar events and tasks (SQL Server tables & C#)

I need to scheduled events, tasks, appointments, etc. in my DB. Some of them will be one time appointments, and some will be reoccurring "To-Dos" which must be checked off. After looking a google's calendar layout and others, plus doing a lot of reading here is what I have so far.
Calendar table (Could be called schedule table I guess): Basic_Event Title, start/end, reoccurs info.
Calendar occurrence table: ties to schedule table, occurrence specific text, next occurrence date / time????
Looked here at how SQL Server does its jobs: http://technet.microsoft.com/en-us/library/ms178644.aspx
but this is slightly different.
Why two tables: I need to track status of each instance of the reoccurring task. Otherwise this would be much simpler...
so... on to the questions:
1) Does this seem like the proper way to go about it? Is there a better way to handle the multiple occurrence issue?
2) How often / how should I trigger creation of the occurrences? I really don't want to create a bunch of occurrences... BUT... What if the user wants to view next year's calendar...
Makes sense to have your schedule definition for a task in one table and then a separate table to record each instance of that separately - that's the approach I've taken in the past.
And with regards to creating the occurrences, there's probably no need to create them all up front. Especially when you consider tasks that repeat indefinitely! Again, the approach I've used in the past is to only create the next occurrence. When that instance is actioned, the next instance is then calculated and created.
This leaves the issue of viewing future occurrences. For this, you can start of with the initial/next scheduled occurrence and just calculate the future occurrences on-the-fly at display time.
While this isn't an exact answer to your question I've solved this problem before in SQL Server (though database here is irrelevant) by modeling a solution based on Unix's cron.
Instead of string parsing we used integer columns in a table to store the various time units.
We had events which could be scheduled; they could either point to a one-time schedule table that represented a distinct point in time (a date/time) or to the recurring schedule table which is modelled after cron.
Additionally remember to model your solution correctly. An event has a duration but the duration is unrelated to the schedule (but an event's duration may impact the schedule by causing conflicts). Do not try to model duration as part of your schedule.
In the past when we've done this, we had 2 tables:
1) Schedules -> Includes recurrence information
2) Exceptions -> Edit/changes to specific instances
Using SQL, it's possible to get the list of "Schedules" that have at least one instance in a given date range. Then you can expand in the GUI where each instance lies.

Using NSFetchedResultController with no sorting

1) I wanna display my search results in the same order returned by the web Service, but it seems 'An instance of NSFetchedResultsController requires a fetch request with sort descriptors'.
2) I still wanna use a NSFetchedResultsController because I allow user to sort by date, etc, but if no sorting is chosen I want to display them in the exact order I got them.
3) Another thing, depending on the search, the items might have different priority. Since I store every item, I cannot just create a priority for each since it won't apply to every case.
Thanks in advance
Lucas,
If you want to enforce an order, then you need an attribute to sort against. I suggest you add a serial number to your model and bump it as you insert items.
Andrew

Want an efficient approach to retrieving records from a database when the retrieval is weighted and balanced

Im working on something incredibly unique..... a property listings website. ;)
It displays a list of properties. For each property a teaser image and some caption data is displayed. If the teaser image and caption takes a site visitors interest, they can click on it and get a full property profile. All very standard.
The customer wants to be able to allow property owners to add multiple teaser images and to be able to track which teaser images got the most click throughs. No worries there.
But they also want to allow the property owner to weight each teaser image to control when it is shown. So for 3 images with weightings of 2, 6, 2, the 2nd image would be shown 6/10 times. This needs to be balanced. If the first 6 times the 2nd image is shown, it cant be shown again until the 1st and 3rd images have be shown twice each.
So I need to both increment how often an image has been retrieved and also retrieve images in a balanced way. Forget about actual image handling, Im actually just talking about Urls.
Note incrementing how often it has been retrieved is a different animal to incrementing how often it has captured a click through.
So i can think of a few different ways to approach the problem using database triggers or maybe some LINQ2SQL, etc but it strikes me that someone out there will know of a solution that could be orders fo magnitude faster than what i might come up with.
My first rough idea is to have a schema like so:
TeaseImage(PropId, ImageId, ImageUrl, Weighting, RetrievedCount, PropTotalRetrievedCount)
and then
select ImageRanks.*
from (Select t.ImageID,
t.ImageUrl,
rank() over (partition by t.RetrievedCount order by sum(t.RetrievedCount) desc) as IMG_Rank
from TeaseImage t
where t.RetrievedCount<t.Weighting
group by t.PropID) ImageRanks
where ImageRanks.IMG_Rank <= 1
And then
1. for each ImageId in the result set increment RetrievedCount by 1 and then
2. for each PropId in ResultSet increment PropTotalRetrievedCount by 1 and then
3. for each PropId in ResultSet check if PropTotalRetrievedCount ==10 and if so reset it to PropTotalRetrievedCount = 0 and RetrievedCount=0 for each associated ImageId
Which frankly sounds awful :(
So any ideas?
Note: if I have to step out of the datalayer I'd be using C# / .Net. Thanks.
If you want to do this entirely in your database, you could split your table in two:
Image(ImageId, ImageUrl)
TeaseImage(TeaseImageId, PropId, ImageId, DateLastAccessed)
The TeaseImage table manages weightings by storing additional (redundant) copies of each property-image pair. So an image with a weight of six would get six records.
Then the following query gives you the least-recently used record.
select top 1 ti.TeaseImageId, i.ImageUrl
from TeaseImage ti
join Image i
on i.ImageId = ti.ImageId
where ti.PropId = #PropId
order by ti.DateLastAccessed
Following the select, just update the record's DateLastAccessed. (Or even update it as part of the select procedure, depending on how fault-tolerant you need to be.)
Using this technique would give you fine-grained control over the order of image delivery, (by seeding their DateLastAccessed values appropriately) and you could easily modify the ratios if need be.
Of course, as the table grows, the additional records would degrade query performance earlier than other approaches, but depending on the cost of the query relative to everything else that's going on that may not be significant.