Counting items over time using SPARQL - sparql

Given a list of items with start and end dates (Nobel Physics prize winners with birth and death in Wikidata, for example), how could I use SPARQL to return a count of the number in existence in every year of a certain range? For example, how many Nobel winners were alive every year from 1950 to 2000?
I can see how to count the winners alive in a single year, such as 1950:
SELECT (COUNT (DISTINCT ?nobel) AS ?count)
WHERE
{
?nobel wdt:P166 wd:Q38104.
?nobel wdt:P569 ?birthdate
OPTIONAL{?nobel wdt:P570 ?deathdate.}
FILTER(YEAR(?birthdate) <= 1950 && YEAR(?deathdate) >= 1950)
}
But is there a way to get the count for every year between 1950 and 2000?
(I realize that I could get a table of birth and death dates for every winner and get my answer in another program, but I wonder if I could do it using only SPARQL.)

Related

How to write an SQL query to get max number of counts for the most number of travelling of a user within a month

I have been given a task by my manager to write a SQL query to select the max number of counts (no of records) for a user who has travelled the most within a month provided that if the user travels multiple places on the same date, then it should be counted as one. For instance, if you look at the following table design; according to this scenario, my query must return me a count of 2. Although traveller_id "1" has traveled three times within a month, but he traveled to Thailand and USA on the same date, that is why its count is reduced to 2.
I have also developed my logic for this query but I am unable to write it due to lack of syntax knowledge. I split up this query into 3 parts:
Select All records from the table within a month using the MONTH function of SQL
Select All distinct DateTime records from the above result so that the same DateTime gets eliminated.
Select max number of counts for the traveller who visited most places.
Please help me in completing my query. You can also use a different approach from mine.
You can use the count aggregation in a cte then select top(1):
with u as
(select traveller_id,
count(distinct visit_date) as n
from travellers_log
where visit_date between '2022-03-01' and '2022-03-31'
group by traveller_id)
select top(1) traveller_id, name, n from u inner join table_travellers
on u.traveller_id = table_travellers.id
order by n desc;

find mean batting average ("Ave") and total number of centuries by country but only including records for which starting year is 2010 or later

I have a dataset of cricket players and want to find mean batting average "Ave" and total number of centuries “Hundreds” by country (column) but only including records for which starting year “From” is 2010 or later
Ave, Hundreds, Country, From are the columns name
new_data.groupby(['Country'])['Ave'].mean()
new_data.groupby(['Country'])['Hundreds']
I want to apply these two in a single line and also want to use the condition that starting Year should be 2010 or later
I am assuming you have two columns only Ave and Hundreds. You can do it by using Pandas .agg method.
grouped_data = new_data[new_data['From'].year >= 2010].groupby(['Country'])
grouped_data.agg(['mean', 'sum'])
Let me know if it doesn't work.
new_data[new_data['From'].year >= 2010].groupby(['Country'])['Ave'].mean()
You can do the same for 'Hundreds'.

SPARQL search for first binding then stop

I have some stock data and I want to find the stock closing price two days following an event in which ?date was bound:
BIND (?date + \"P2D\"^^xsd:dayTimeDuration As ?doe)
?event <http://www.foo.com/stock/date> ?doe.
?event <http://www.foo.com/stock/close> ?close.
I can think of ways to increment the 2 but I want to stop as soon as I get a value for ?close. I want to increment "trading days" not really calendar days.
Is there an elegant way to keep incrementing "P2D" but then stop when I get a value?
I'd do something like SELECT the closing values for every day in the week (or whatever the longest gap in trading is) starting 2 calendar days after ?doe, then ORDER BY date, and LIMIT 1.
Elegant? Maybe not. But no stepping, and should be fairly fast.

MDX- Divide Each row by a value based on parent

I am in a situation where I need to calculate Percentage for every fiscal year depending on distinct count of the rows.
I have achieved the distinct count (fairly simple task) for each year city-wise and reached till these 2 listings in cube.
The first listing is state wide distinct count for given year.
Second listing is city wise distinct count for given year with percentage based on state-wide count for that year for that city.
My problem is that I need to prepare a calculated member for the percentage column for each given year.
For eg, In year 2009, City 1 has distinct count of 2697 and percentage raise of 32.94%. (Formula used= 2697/8187 ).
I tried with ([Measures].[Distinct Count])/(SUM(ROOT(),[Measures].[Distinct Count])) but no luck.
Any help is highly appreciated.
Thanks in advance.
PS: City wide sum of year 2009 can never be equal to statewide distinct count of that year. This is because we are calculating the distinct count for city and state both.
You need to create a Region Hierarchy for this, like State -> City. The create a calculation like below. Then in the browser put your Hierarchy on the left and the sales and calculated percentage in values.
([Dim].[Region].CurrentMember, [Measures].[Salesamt]) /
iif(
([Dim].[Region].CurrentMember.Parent, [Measures].[Salesamt]) = 0,
([Dim].[Region].CurrentMember, [Measures].[Salesamt]),
([Dim].[Region].CurrentMember.Parent, [Measures].[Salesamt])
)

using MIN on a datepart with Group BY not working, returns different dates

Can anyone help with an aggregate function.. MIN.
I have a car table that i want to return minimum sale price and minimum year on a tbale that has identical cars but different years and price ...
Basically if i removed Registration (contains a YEAR) from the group by and select the query works but if i leave it in then i get 3 cars returned which are exactly the same model,make etc but with different years..
But i am using MIN so it should return 1 car with the year 2006 (the minimum year between the 3 cars)
The MIN(SalePrice) is working perfectly .. its the registraton thats not owrking..
Any ideas?
SELECT
MIN(datepart(year,[Registration])) AS YearRegistered,
MIN(SalePrice), Model, Make
FROM
[VehicleSales]
GROUP BY
datepart(year,[Registration]), Model, Make
IF I have correctly understood what you are looking for, you should query:
SELECT Model, Make, MIN(datepart(year,[Registration])) AS YearRegistered, MIN(SalePrice)
FROM [VehicleSales]
GROUP BY Model, Make
Hope it helps.
Turro answer will return the lowest registration year and the lowest price for (Model, Make), but this doesn't mean that lowest price will be for the car with lowest Year.
Is it what you need?
Or, you need one of those:
lowest price between the cars having lowest year
lowest year between the cars having lowest price
-- EDITED ---
You are correct about the query, but I want to find the car make/model that gets cheaper the next year ;)
That's why I made a comment. Imagine next situation
Porshe 911 2004 2000
Porshe 911 2004 3000
Porshe 911 2005 1000
Porshe 911 2005 5000
You'll get result that will not really tell you if this car goes cheaper based on year or not.
Porshe 911 2004 1000
I don't know how you'll tell if car gets cheaper next year based on one row without comparison with previous year, at least.
P.S. I'd like to buy one of cars above for listed price :D
You're getting what you're asking for: the cars are put into different groups whenever their model, make, or year is different, and the (minimum, i.e. only) year and minimum price for each of those groups is returned.
Why are you using GROUP BY?
You are correct about the query, but I want to find the car make/model that gets cheaper the next year ;)
You should find cheapest (or average) make/model per year and compare with the cheapest (or average) from previous year (for the same make/model).
Then you can see which of them gets cheaper the next year (I suppose most of them)