RowNumber Window Query for Hiscores Ranking - Django - sql

I'm trying to build a game hiscore view with rankings for my Django site, and I'm having some issues.
The query I have is the following:
row_number_rank = Window(
expression=RowNumber(),
partition_by=[F('score_type')],
order_by=F('score').desc()
)
hiscores = Hiscore.objects.annotate(rank=row_number_rank).values()
The query above works perfectly, and properly assigns each row a rank according to how it compares to other scores within each score type.
The result of this is the following:
{ 'id': 2, 'username': 'Bob', 'score_type': 'wins', 'score': 12, 'rank': 1 }
{ 'id': 1, 'username': 'John', 'score_type': 'wins', 'score': 5, 'rank': 2 }
{ 'id': 4, 'username': 'John', 'score_type': 'kills', 'score': 37, 'rank': 1 }
{ 'id': 3, 'username': 'John', 'score_type': 'kills', 'score': 5, 'rank': 2 }
{ 'id': 5, 'username': 'Bob', 'score_type': 'kills', 'score': 2, 'rank': 3 }
The issue comes in when I want to retrieve only a specific user's scores from the above results. If I append a filter(username="Bob") the query is now:
row_number_rank = Window(
expression=RowNumber(),
partition_by=[F('score_type')],
order_by=F('score').desc()
)
hiscores = Hiscore.objects.annotate(rank=row_number_rank).filter(username='Bob').values()
Unexpectedly, adding this filter step has yielded the following incorrect results:
{ 'id': 2, 'username': 'Bob', 'score_type': 'wins', 'score': 12, 'rank': 1 }
{ 'id': 5, 'username': 'Bob', 'score_type': 'kills', 'score': 2, 'rank': 1 }
Randomly, the rank on the id=5 entry has decided to change to 1 instead of its correct value of 3.
Why would adding this filter step modify the values of the fields in the QuerySet, instead of just excluding the proper elements from it?
Thanks.

Related

Api gives wrong data

I use the Google AdWords API to collect information about the search volume for a specific keyword. But the data I get as a response doesn't match with the data from the keyword planner or other keyword tools. Here I check the search volume for the keyword "Hunde" in Berlin, Germany in german.
targeting_service = adwordsClient.GetService('TargetingIdeaService')
selector = {'ideaType': 'KEYWORD', 'requestType' : 'STATS'}
selector['requestedAttributeTypes'] = ['KEYWORD_TEXT', 'SEARCH_VOLUME', 'TARGETED_MONTHLY_SEARCHES']
offset = 0
selector['paging'] = {'startIndex' : str(offset), 'numberResults' : str(1)}
selector['searchParameters'] = [{
'xsi_type': 'RelatedToQuerySearchParameter',
'queries': ["hunde"]
}]
selector['searchParameters'].append({
'xsi_type': 'LocationSearchParameter',
'locations': [{'id': '1003854'}]
})
selector['searchParameters'].append({
'xsi_type': 'LanguageSearchParameter',
'languages': [{'id': '1001'}]
})
page = targeting_service.get(selector)
print(page)
As a response I get:
{
'totalNumEntries': 1,
'entries': [
{
'data': [
{
'key': 'KEYWORD_TEXT',
'value': {
'Attribute.Type': 'StringAttribute',
'value': 'hunde'
}
},
{
'key': 'TARGETED_MONTHLY_SEARCHES',
'value': {
'Attribute.Type': 'MonthlySearchVolumeAttribute',
'value': [
{
'year': 2020,
'month': 12,
'count': 4743382
},
{
'year': 2020,
'month': 11,
'count': 455583
},
{
'year': 2020,
'month': 10,
'count': 8797951
},
{
'year': 2020,
'month': 9,
'count': 5218694
},
{
'year': 2020,
'month': 8,
'count': 5089585
},
{
'year': 2020,
'month': 7,
'count': 3149591
},
{
'year': 2020,
'month': 6,
'count': 3020638
},
{
'year': 2020,
'month': 5,
'count': 4928527
},
{
'year': 2020,
'month': 4,
'count': 754959
},
{
'year': 2020,
'month': 3,
'count': 5649676
},
{
'year': 2020,
'month': 2,
'count': 1590789
},
{
'year': 2020,
'month': 1,
'count': 2506674
}
]
}
},
{
'key': 'SEARCH_VOLUME',
'value': {
'Attribute.Type': 'LongAttribute',
'value': 3825504
}
}
]
}
]
}
But this data doesn't match with the data from the keyword planer.
Avg. monthly searches (Keyword planner): 10K – 100K
Does somebody knows why the data I'm receiving is wrong?
These questions pop up somewhat frequently and are generally not easy to answer. Did you make sure that the specified searchParameters in your request correspond exactly to what you are using in the Keyword Planner?
Additionally, you could check out the KeywordPlanService of the newer Ads API. According to this post by a Google Ads API advisor, it should be closer to what you can do in the web UI than the Adwords API's TargetingIdeaService.
If OP didn't figure it out, i've had the same headaches.
I solved this when adding NetworkSearchParameter to [searchparameter] so the API only returns google data.
my code after adding in the additional argument.
selector['searchParameters'] = [{
'xsi_type' : 'RelatedToQuerySearchParameter',
'queries' : sublist,
},
{
'xsi_type':'LocationSearchParameter',
'locations' : [country_ids[country]],
},
{
'xsi_type': 'NetworkSearchParameter',
'networkSetting': {
'targetGoogleSearch': True,
'targetSearchNetwork': False,
'targetContentNetwork': False,
'targetPartnerSearchNetwork': False
}}]

fetch the data from array of objects sql BigQuery

I need to fetch key value pairs from the second object in array. Also, need to create new columns with the fetched data. I am only interested in the second object, some arrays have 3 objects, some have 4 etc. The data looks like this:
[{'adUnitCode': ca-pub, 'id': 35, 'name': ca-pub}, {'adUnitCode': hmies, 'id': 49, 'name': HMIES}, {'adUnitCode': moda, 'id': 50, 'name': moda}, {'adUnitCode': nova, 'id': 55, 'name': nova}, {'adUnitCode': listicle, 'id': 11, 'name': listicle}]
[{'adUnitCode': ca-pub, 'id': 35, 'name': ca-pub-73}, {'adUnitCode': hmiuk-jam, 'id': 23, 'name': HM}, {'adUnitCode': recipes, 'id': 26, 'name': recipes}]
[{'adUnitCode': ca-pub, 'id': 35, 'name': ca-pub-733450927}, {'adUnitCode': digital, 'id': 48, 'name': Digital}, {'adUnitCode': movies, 'id': 50, 'name': movies}, {'adUnitCode': cannes-film-festival, 'id': 57, 'name': cannes-film-festival}, {'adUnitCode': article, 'id': 57, 'name': article}]
The desired output:
adUnitCode id name
hmies 49 HMIES
hmiuk-jam 23 HM
digital 48 Digital
Below is for BigQuery Standard SQL
#standardSQL
select
json_extract_scalar(second_object, "$.adUnitCode") as adUnitCode,
json_extract_scalar(second_object, "$.id") as id,
json_extract_scalar(second_object, "$.name") as name
from `project.dataset.table`, unnest(
[json_extract_array(regexp_replace(mapping, r"(: )([\w-]+)(,|})", "\\1'\\2'\\3"))[safe_offset(1)]]
) as second_object
if applied to sample data from your question - output is
as you can see, the "trick" here is to use proper regexp in regexp_replace function. I've included now any alphabetical chars and - . you can include more as you see needed
As an alternative yo can try regexp_replace(mapping, r"(: )([^,}]+)", "\\1'\\2'") as in below example - so you will cover potentially more cases without changes in code
#standardSQL
select
json_extract_scalar(second_object, "$.adUnitCode") as adUnitCode,
json_extract_scalar(second_object, "$.id") as id,
json_extract_scalar(second_object, "$.name") as name
from `project.dataset.table`, unnest(
[json_extract_array(regexp_replace(mapping, r"(: )([^,}]+)", "\\1'\\2'"))[safe_offset(1)]]
) as second_object

Pandas Groupby: return dict of rows

I would like to group my dataframe by one of the columns and then return a dictionary that has a list of all of the rows per column value. Is there a fast Pandas idiom for doing this?
Example:
test = pd.DataFrame({
'id': ['alice', 'bob', 'bob', 'charlie'],
'transaction_date': ['2020-01-01', '2020-01-01', '2020-01-02', '2020-01-02'],
'amount': [50.0, 10.0, 12.0, 13.0]
})
Desired output:
result = {
'alice': [Series(transaction_date='2020-01-01', amount=50.0)],
'bob': [Series(transaction_date='2020-01-01', amount=10.0), Series(transaction_date='2020-01-02', amount=12.0)],
'charlie': [Series(transaction_date='2020-01-02', amount=53.0)],
}
The following approaches do NOT work:
test.groupby('id').agg(list)
Returns a Dataframe where each column (amount and transaction_date) has a list of values, but that's not what I want. I want the result to be one list of rows / Pandas series per unique grouping column value ('id' value).
test.groupby('id').agg(list).to_dict():
{'amount': {'charlie': [13.0], 'bob': [10.0, 12.0], 'alice': [50.0]}, 'transaction_date': {'charlie': ['2020-01-02'], 'bob': ['2020-01-01', '2020-01-02'], 'alice': ['2020-01-01']}}
test.groupby('id').apply(list).to_dict():
{'charlie': ['amount', 'id', 'transaction_date'], 'bob': ['amount', 'id', 'transaction_date'], 'alice': ['amount', 'id', 'transaction_date']}
Use itertuples and zip,
import pandas as pd
test = pd.DataFrame({
'id': ['alice', 'bob', 'bob', 'charlie'],
'transaction_date': ['2020-01-01', '2020-01-01', '2020-01-02', '2020-01-02'],
'amount': [50.0, 10.0, 12.0, 13.0]
})
columns = ['transaction_date', 'amount']
grouped = (test
.groupby('id')[columns]
.apply(lambda x: list(x.itertuples(name='Series', index=False))))
print(dict(zip(grouped.index, grouped.values)))
{
'alice': [Series(transaction_date='2020-01-01', amount=50.0)],
'bob': [
Series(transaction_date='2020-01-01', amount=10.0),
Series(transaction_date='2020-01-02', amount=12.0)
],
'charlie': [Series(transaction_date='2020-01-02', amount=13.0)]
}

pandas: aggregate array during groupby, equivalent of SQL's array_agg?

I've got this dataframe:
df1 = pd.DataFrame([
{ 'id': 1, 'spend': 60, 'store': 'Stockport' },
{ 'id': 2, 'spend': 68, 'store': 'Didsbury' },
{ 'id': 3, 'spend': 70, 'store': 'Stockport' },
{ 'id': 4, 'spend': 35, 'store': 'Didsbury' },
{ 'id': 5, 'spend': 16, 'store': 'Didsbury' },
{ 'id': 6, 'spend': 12, 'store': 'Didsbury' },
])
I've grouped it by store and got the total spend by store:
df.groupby("store").agg({'spend': 'sum'})\
.reset_index().sort_values("spend", ascending=False)
store spend
Didsbury 131
Stockport 130
Is there a way I can get the IDs for each store as a column in the grouped object? Like the equivalent of ARRAY_AGG in Postgres? So the desired output would be:
store spend ids
Didsbury 131 [2,4,5,6]
Stockport 130 [1,3]
We can use named_aggregations, which is an aggregation method available since pandas >= 0.25.0.
Notice how we can instantely rename our column to "ids":
df1.groupby('store').agg(
spend=('spend', 'sum'),
ids=('id', list)
).reset_index()
store spend ids
0 Didsbury 131 [2, 4, 5, 6]
1 Stockport 130 [1, 3]
You can pass list like aggregation function for id column:
df = (df1.groupby("store").agg({'spend': 'sum', 'id':list})
.reset_index()
.sort_values("spend", ascending=False))
print (df)
store spend id
0 Didsbury 131 [2, 4, 5, 6]
1 Stockport 130 [1, 3]

pandas same attribute comparison

I have the following dataframe:
df = pd.DataFrame([{'name': 'a', 'label': 'false', 'score': 10},
{'name': 'a', 'label': 'true', 'score': 8},
{'name': 'c', 'label': 'false', 'score': 10},
{'name': 'c', 'label': 'true', 'score': 4},
{'name': 'd', 'label': 'false', 'score': 10},
{'name': 'd', 'label': 'true', 'score': 6},
])
I want to return names that have the "false" label score value higher than the score value of the "true" label with at least the double. In my example, it should return only the "c" name.
First you can pivot the data, and look at the ratio, filter what you want:
new_df = df.pivot(index='name',columns='label', values='score')
new_df[new_df['false'].div(new_df['true']).gt(2)]
output:
label false true
name
c 10 4
If you only want the label, you can do:
new_df.index[new_df['false'].div(new_df['true']).gt(2)].values
which gives
array(['c'], dtype=object)
Update: Since your data is result of orig_df.groupby().count(), you could instead do:
orig_df['label'].eq('true').groupby('name').mean()
and look at the rows with values <= 1/3.