Count number of rows with similar times

Count number of rows with similar times - sql

good afternoon. What is the point, the train has a geotag that determines its position in space. Location data is entered into a table. It is required to calculate how many times the train was at a certain point. But the problem is that being at a certain point, the geotag leaves several entries in the table by time. I wrote a query that allows you to count the number of arrivals, but the problem is that it works if you take only one train in the selection of values, and if you take several entries in a row, the query already counts incorrectly. Below I will attach a table and a written query
table
query1
if you select several train numbers, the values are confused and are considered as one
Now I have this request, it counts the number of arrivals, but it counts incorrectly, if instead of several trains you specify only one in the selection, everything will be correct, what is my mistake ???
query2

I think you should use group by for address and zone.
By "train"| you mean adress?

Related

Reducing database load from consecutive queries

I have an application which calls the database multiple times to achieve one simple goal.
A little information about this application; In short, the application scrapes data from a webpage & stores specific information from this page into a database. The important information in this query is: Player name, Position. There can be multiple sitting at one specific position, kill points & Class
Player name has every potential to change or remain the same every day
Regarding the Position, there can be multiple sitting in one position
Kill points has the potential to increase or remain the same every day
Class, there is only 2 possibilities that a name can be, Ex: A can change to B or remain A (same in reverse), but cannot be C,D,E,F
The player name can change at any particular day, Position can also change dependent on the kill point increase from the last update which spins back around to the goal. This is to search the database day by day, from the current date to as far back as 2021-02-22 starting at the most recent entry for a player name and back track to the previous day to check if that player name is still the same or has changed.
What is being used as a main reference to the change is the kill points. As the days go on, this number will either be the exact same or increase, it can never decrease.
So now onto the implementation of this application.
The first query which runs finds the most recent entry for the player name
SELECT TOP(1) * FROM [changes] WHERE [CharacterName]=#charname AND [Territory]=#territory AND [Archived]=0 ORDER BY [Recorded] DESC
Then continue to check the previous days entries with the following query:
SELECT TOP(1) * FROM [changes] WHERE [Territory]=#territory AND [CharacterName]=#charname AND [Recorded]=#searchdate AND ([Class] LIKE '%{Class}%' OR [Class] LIKE '%{GetOpposite(Class)}%' AND [Archived]=0 )
If no results are found, will then proceed to find an alternative name with the following query:
SELECT TOP(5) * FROM [changes] WHERE [Kills] <= #kills AND [Recorded]='{Data.Recorded.AddDays(-1):yyyy-MM-dd}' AND [Territory]=#territory AND [Mode]=#mode AND ([Class] LIKE #original OR [Class] LIKE #opposite) AND [Archived]=0 ORDER BY [Kills] DESC
The aim of the query above is to get the top 5 entries that are the closest possible matches & Then cross references with the day ahead
SELECT COUNT(*) FROM [changes] WHERE [CharacterName]=#CharacterName AND [Territory]=#Territory AND [Recorded]=#SearchedDate AND [Archived]=0
So with checking the day ahead, if the character name is not found in the day ahead, then this is considered to be the old player name for this specific character, else after searching all 5 of the results and they are all found to be present in the day aheads searches, then this name is considered to be new to the table.
Now with the date this application started to run up to today's date which is over 400 individual queries on the database to achieve one goal.
It is also worth a noting that this table grows by 14,400 - 14,500 Rows each and every day.
The overall question to this specific? Is it possible to bring all these queries into less calls onto the database, reduce queries & improve performance?

What you can do to improve performance will be based on what parts of the application stack you can manipulate. Things to try:
Store Less Data - Database content retrieval speed is largely based on how well the database is ordered/normalized and just how much data needs to be searched for each query. Managing a cache of prior scraped pages and only storing data when there's been a change between the current scrape and the last one would guarantee less redundant requests to the db.
Separate specific classes of data - Separating data into dedicated tables would allow you to query a specific table for a specific character, etc... effectively removing one where clause.
Reduce time between queries - Less incoming concurrent requests means less resource contention and faster response times to prior requests.
Use another data structure - The only reason you're using top() is because you need data ordered in some specific way (most-recent, etc...). If you just used a code data structure that keeps the data ordered and still easily-query-able you could then perhaps offload some sql requests to this structure instead of the db.
The suggestions above are not exhaustive, but what you do to improve performance is largely a function of what in the application stack you have the ability to modify.

How to populate all possible combination of values in columns, using Spark/normal SQL

I have a scenario, where my original dataset looks like below
Data:
Country,Commodity,Year,Type,Amount
US,Vegetable,2010,Harvested,2.44
US,Vegetable,2010,Yield,15.8
US,Vegetable,2010,Production,6.48
US,Vegetable,2011,Harvested,6
US,Vegetable,2011,Yield,18
US,Vegetable,2011,Production,3
Argentina,Vegetable,2010,Harvested,15.2
Argentina,Vegetable,2010,Yield,40.5
Argentina,Vegetable,2010,Production,2.66
Argentina,Vegetable,2011,Harvested,15.2
Argentina,Vegetable,2011,Yield,40.5
Argentina,Vegetable,2011,Production,2.66
Bhutan,Vegetable,2010,Harvested,7
Bhutan,Vegetable,2010,Yield,35
Bhutan,Vegetable,2010,Production,5
Bhutan,Vegetable,2011,Harvested,2
Bhutan,Vegetable,2011,Yield,6
Bhutan,Vegetable,2011,Production,3
Image of the above csv:
Now there is a very small country lookup table which has all possible countries the source data can come with, listed. PFB:
I want to have the output data's number of columns always fixed (this is to ensure the reporting/visualization tool doesn't get dynamic number columns with every day's new source data ingestions depending on the varying distinct number of countries present).
So, I've to somehow join the source data with the country_lookup csv and populate all those columns with default value as F. Every country column would be binary with T or F being the possible values.
The original dataset from the above has to be converted into below:
Data (I've kept the Amount field unsolved for column Type having Derived Yield as is, rather than calculating them below for a better understanding and for you to match with the formulae):
Country,Commodity,Year,Type,Amount,US,Argentina,Bhutan,India,Nepal,Bangladesh
US,Vegetable,2010,Harvested,2.44,T,F,F,F,F,F
US,Vegetable,2010,Yield,15.8,T,F,F,F,F,F
US,Vegetable,2010,Production,6.48,T,F,F,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
US,Vegetable,2011,Harvested,6,T,F,F,F,F,F
US,Vegetable,2011,Yield,18,T,F,F,F,F,F
US,Vegetable,2011,Production,3,T,F,F,F,F,F
US,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F
US,Vegetable,2011,Derived Yield,(6+2)/(3+3),T,F,T,F,F,F
US,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
Argentina,Vegetable,2010,Harvested,15.2,F,T,F,F,F,F
Argentina,Vegetable,2010,Yield,40.5,F,T,F,F,F,F
Argentina,Vegetable,2010,Production,2.66,F,T,F,F,F,F
Argentina,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F
Argentina,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F
Argentina,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
Argentina,Vegetable,2011,Harvested,10,F,T,F,F,F,F
Argentina,Vegetable,2011,Yield,90,F,T,F,F,F,F
Argentina,Vegetable,2011,Production,9,F,T,F,F,F,F
Argentina,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F
Argentina,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F
Argentina,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
Bhutan,Vegetable,2010,Harvested,7,F,F,T,F,F,F
Bhutan,Vegetable,2010,Yield,35,F,F,T,F,F,F
Bhutan,Vegetable,2010,Production,5,F,F,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
Bhutan,Vegetable,2011,Harvested,2,F,F,T,F,F,F
Bhutan,Vegetable,2011,Yield,6,F,F,T,F,F,F
Bhutan,Vegetable,2011,Production,3,F,F,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
The image of the above expected output data for a structured look at it:
Part 1 -
Part 2 -
Formulae for populating Amount Field for Derived Type:
Derived Amount = Sum of Harvested of all countries with T (True) grouped by Year and Commodity columns divided by Sum of Production of all countries with T (True)grouped by Year and Commodity columns.
So, the target is to have a combination of all the countries from source and calculate the sum of respective Harvested and Production values which then has to be divided. The commodity can be more than one in the actual scenario for any given country, but that should not bother as the summation of amount happens on grouped commodity and year.
Note: The users in the frontend can select any combination of countries. The sole purpose of doing it in the backend rather than dynamically doing it in the frontend is because AWS QuickSight (our visualisation tool), even though can populate sum on selected column filters but doesn't yet support calculation on those derived summed fields. Hence, the entire calculation of all combination of countries has to be pre-populated (very naive approach) in order to make it available in report on dynamic users selection of countries.
Also if you've any better approach (than the above naive approach mentioned in note) to solve this problem, you are most welcome to guide me. I've also posted a question on the same problem without writing my expected approach for experts to show me the path on how we can solve this kind of a problem better than this naive approach. If you want to help solve it with some other technique, you're most welcome, here is the link to that question.
Any help shall be greatly acknowledged.

DAX sum different DateTime

I have a problem here, i would like to sum the work time from my employee based on the data (time2 - time 1) daily and here is my query:
Effective Minute Work Time = 24. * 60 * (LASTNONBLANK(time2,0) -FIRSTNONBLANK(time1,0))
It works daily, but if i drill up to weekly / monthly data it show the wrong sum as it shown below :
What i want is summary of minute between daily different times (time2-time1)
Thanks for your help :)

You have several approaches you can take: the hard way or the easier way :). The harder (at least for me :)) is to use DAX to do this. You would:
1) create a date table,
2) Use the DAX calculate function to evaluate your last non-blank and first non-blank values (you might need to use calculate table, but I'm not sure; DAX experts jump in). Then subtract one vs. the other.
This will give you correct values for a given day for a given person. You can enforce the latter condition by putting a 'has one value' guard on the person name so that your measure informs the report author if they're not using it right.
Doing the same for dates is a little trickier. In the example you show you are including the date in the row grouping. But if you change your mind and want instead to have 'total hours worked by person' or 'total hours worked by everyone' you're not done with modelling yet.
Your next step is to use calculate table in combination with calculate to create a measure that returns the total. You'll use calculate table so you evaluate each date and the hours worked on that date by person. Then you'll use calculate to summarize that all down to a single number. If you're not careful with your DAX (or report authoring) you might mix which person you're summarizing for so that your first/last non blank are not at the person level. It gets intense quickly.
Your easier solution, though it might be more limited in its application - depends really on your scenario - is to use the query to transform the data into a summary by day and person using the group by command. This will give you a row per person per day with their start and end times. Then you can quickly calculate the hours worked on that day. Then you can quite easily build visuals on top of the summary data. Of course you give up some of the flexibility of the having a proper data model. However if you have a date table, a person table, and your summary table and then setup your relationships correctly you can achieve answers to the most common questions.

Two Dimensional Diagram with aggr function

I'm having a very curious Problem in QlikView.
I have a number of readouts from a Database which show certain amounts of time in a different state.
In that table there are 49 variables that describe the state, there are 7 levels of i.e the SOC and seven states of the Temperature.
i.e the one of the fields could be named: SOC1_T1 or SOC2_T1 and so on...
So what i get is a table full of readouts in which every i have an specific id for the object, the state of the variables and an age. There are multiple entries per Object.
What i want to do is to plot a two dimensional diagram over all the states so i get SOC over Temperatur Histogram(Average of the maximum (or newest) value of every object).
I tried creating to Dynamic (or syntethic) Dimensions (ValueLoop(1,7) and ValueLoop(1,8).
In the formulas i reffered to them with
=If(ValueLoop(1,7) = 1 and ValueLoop(1,8) = 1,
(avg(aggr(FirstSortedValue (SOC1_T1
, -age), id)) * 100))
and created 49 Formulas with each state variable output.
Problem now is:
It only shows the first entry. I can replace the whole expression in the if condition with a specific number (100) and get a result. I also plotted the inner expression into a Listbox and checked wheter the result is not null.
As soon as I delete the aggr function and just take the AVG over everything (which is not what i want). Everything works fine. When i turn back to aggr, only the first one is shown.
Doesnt help by the way when i delete one of the dimensions, this doesnt work one dimensional either.
Any ideas or workarounds?
Greetings
Julian

My first count gets updated by my second count?

The database we use is structured in a way that we have a table for Training Programs, one for Training Events linked to the Training Program one, then another one for Training Event Activities that are linked to the Training Event. There's a lot more but that's the basics.
I joined the training event and activity tables together, then the training program to the training event tables.
In my select statement I did two counts, one of TrainingEvent.guTrainingProgramId (linked column that relates to TrainingProgram.rowguid). If I remove all instances of Training Event Activity I get what seems to be an accurate represenation of a count of the number of events for each entry in the Training Program table. However once I add in the count of TrainingEventActivity.guTrainingEventId (link to Training Event table) I get an accurate count of the total number of activities, however the coun in the TrainingEvent count changes to reflect the count of activities.
I essentially want to know a count of the number of times that TrainingEvent.guTrainingProgramId = TrainingProgram.rowguid, and a second count where TrainingEventActivity.guTrainingEventId = TrainingEvent.rowguid.
What am I doing wrong?
Thanks!

Without seeing your sql, it's hard to tell. But my initial thought would be you'll need to do a count distinct instead of count on the id. Oh well. Let's see the SQL.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas