List of addresses for gambling in Bitcoin - bitcoin

I'd like to analyze the gambling activities in Bitcoin.
Does anyone has a list of addresses for gambling services such as SatoshiDICE and LuckyBit?
For example, I found addresses of SatoshiDICE here.
https://www.satoshidice.com/Bets.php

My suggestion would be to go and look for a list of popular addresses, i.e., addresses that received and/or sent a lot of transactions. Most gambling sites will use vanity addresses that include part of the site's name in the address, so you might also just search in the addresses for similar patterns.
It's rather easy to build such a list using Rusty Russell's bitcoin-iterate if you have a synced full node:
bitcoin-iterate --output "%os" -q > outputscripts.csv
This will get you a list of all output scripts in confirmed transactions in the blockchain. The output scripts include the pubkey hash that is also encoded in the address.
Let's keep only the P2PKH scripts of the form 76a914<pubkey-hash>88ac
grep -E '^76a914.*88ac$' outputscripts.csv > p2pkhoutputs.csv
Just for reference, the 90.03% (484715631/538368714) of outputs are to P2PKH scripts, so we should be getting pretty accurate results.
So let's get a count for each outputscript and count its occurence:
sort p2pkhoutputs.csv | uniq -c | sort -g > uniqoutputscripts.csv
And finally let's convert the scripts to the addresses. We'll need to do the base58 encoding, and I chose the python base58 library:
from base58 import b58encode_check
def script2address(s):
h = s.decode('hex')[3:23]
h = chr(0) + h
return b58encode_check(h)
For details on how addresses are generated please refer to the Bitcoin wiki. And here we have the top 10 addresses sorted by incoming transactions:
1880739, 1NxaBCFQwejSZbQfWcYNwgqML5wWoE3rK4
1601154, 1dice8EMZmqKvrGE4Qc9bUFf9PX3xaYDp
1194169, 1LuckyR1fFHEsXYyx5QK4UFzv3PEAepPMK
1105378, 1dice97ECuByXAvqXpaYzSaQuPVvrtmz6
595846, 1dice9wcMu5hLF4g81u8nioL5mmSHTApw
437631, 1dice7fUkz5h4z2wPc1wLMPWgB5mDwKDx
405960, 1MPxhNkSzeTNTHSZAibMaS8HS1esmUL1ne
395661, 1dice7W2AicHosf5EL3GFDUVga7TgtPFn
383849, 1LuckyY9fRzcJre7aou7ZhWVXktxjjBb9S
As you can see SatishiDice and LuckyBit are very much present in the set. Grepping for the vanity addresses unearths a lot of addresses too.

I would suggest using the usual chain analysis approach: send money to these services and note the addresses. Then perform transitive, symmetric etc closures on the same in the blockchain transaction graph to get all addresses in their wallet.
No technique can determine addresses in a wallet of the user is intelligent enough to mix properly.

Related

Twitter Premium API Profile location operators profile_country: and profile_region: not working

I am using premium account (not sandbox) for data collection.
I want to collect:
All tweets in English that contain ‘china’ or ‘chinese’ that are user geolocated to US and not geolocated at tweet level, excluding all retweets
All tweets in English that contain ‘china’ or ‘chinese’ that are user geolocated to ‘Minnesota’ and not geolocated at tweet level, excluding all retweets
The code is as follows:
premium_search_args = load_credentials('twitter_API.yaml',
yaml_key ='search_tweets_premium_api', env_overwrite=False)
# keywords for the search
# key word 1
keywords = '(China OR Chinese) lang:en profile_country:US -place_country:US -is:retweet'
# key word 2
keywords = '(China OR Chinese) lang:en -place_country:US profile_region:"Minnesota" -is:retweet'
# define search rule
rule = gen_rule_payload(keywords,from_date='2019-12-01',
to_date='2019-12-10',results_per_call=500)
# create result stream and print before start
rs = ResultStream(rule_payload=rule, max_results=1250000,
**premium_search_args)
My problems are that:
For the first one, a large portion of the results I get didn’t satisfy the query. First, some don’t have Profile Geo enrichment, i.e. user.derived.locations attribute is not in the user object. Second, if it is, a lot don’t have country code US, i.e. they are identified to other countries.
For the second one, the result I get from this method is a smaller subset of the results I can get from 1). That is, when I filter all tweets user geolocated to Minnesota (by user.derived.locations.region) from profile_country:US, it gives a larger sample than using profile_region:“Minnesota”. A considerable amount of data is missing using this method.
I have tried several times but it seems that user geolocation operators don’t work exactly what I want. Does anyone has any idea why this is the case? I would very much appreciate any answers/suggestions/comments.
Thank you!

Geolocation rtweet search_tweet R

Does search_tweets function from rtweet package has parameters that allow me to filter by geolocation, equivalent to search_tweets parameter "geocode = '40.757343,-73.848261,40km'"?
Not really, you need to download the tweets via rtweet::search_tweets and then impose a geographical cut on coordinates. Furthermore there is also a column entitled bounding_box_coordinates that can be proven useful to your analysis.

HL7 ORU sending edits

I use HL7 ORU message to send clinical notes. At present, I just send notes as they are created and saved. But now I need to support edit and delete of the notes and convey the same to the receiving system.
How can I achieve edit / delete with this? I use ORU^R01 structure and use OBR and multiple OBX segments for my information. Thanks.
You will need to confirm with the receiving system how they want edits and deletes conveyed to them. But it is common to use the result status code in OBR-25 and/or the observation result status code in OBX-11.
For example, if the clinical note is edited (aka. corrected or modified) send a C in OBR-25. If the clinical note is deleted send a X in OBR-25. Ultimately you will need to coordinate with the receiving system.
FWIW, I commonly see these values in OBR-25:
P = preliminary
F = final
C = corrected / modified
X = cancelled / deleted / in-error

Google Analytics API. Problems with two conditions using the metric transactionRevenue in one segment

I am using the Google Analytics API to automatically fetch stats from eccomerce sites. I need to query a dynamic segment with the sessions that spent more than 0 and less than 50USD in ecommerce.
I tried this:
segment=users::condition::perSession::ga:transactionRevenue>0;users::condition::perSession::ga:transactionRevenue<50
But it looks like the API is ignoring the ga:transactionRevenue < 50 condition, returning all the sessions with ga:transactionRevenue>0. I tried some other metrics in the > 0 condition ( like uniquePurchases , ga:transactionTax...) with the same results.
The fun part is that if I use transactionShipping it works OK ( returning the sessions with purchases including shipping costs and with less than 50USD revenue):
segment=users::condition::perSession::ga:transactionShipping >0;users::condition::perSession::ga:transactionRevenue<50
But this is not OK, because I need to include the free shippingtransactions on the segment.
Anybody nows a workarround for this?
Check how you are processing the segment for sending to the API.
The rule is that you should escape a semicolon in a value expression (\;). I suspect you are escaping the semicolon between the conditions by accident because you encode/escape everything after the 'segment=' like segment=<encoded/escaped segment definition>
What you need to to send segment=<encoded/escaped condition1>;<encoded/escaped condition2>

MaxMind GeoLite2 returning wrong location

I'm getting wrong location when I query the GeoLite2-City.mmdb database with ip = '104.6.30.56' (from Python). Their demo site returns good data for this IP (https://www.maxmind.com/en/geoip-demo).
In [33]: import geoip2.database
In [34]: reader = geoip2.database.Reader('.../GeoLite2-City.mmdb')
In [35]: reader.city('104.6.30.56').city # should be Santa Rosa, Ca
Out[35]: geoip2.records.City(geoname_id=None, confidence=None, _locales=['en'], names={})
In [36]: reader.city('104.6.30.56').location # should be ~(38, -122)
Out[36]: geoip2.records.Location(postal_confidence=None, average_income=None, accuracy_radius=None, time_zone=None, longitude=-97.0, metro_code=None, population_density=None, postal_code=None, latitude=38.0)
In [37]: reader.city('173.194.116.131').city # works fine for Google
Out[37]: geoip2.records.City(geoname_id=5375480, confidence=None, _locales=['en'], names={u'ru': u'\u041c\u0430\u0443\u043d\u0442\u0438\u043d-\u0412\u044c\u044e', u'fr': u'Mountain View', u'en': u'Mountain View', u'de': u'Mountain View', u'zh-CN': u'\u8292\u5ef7\u7ef4\u5c24', u'ja': u'\u30de\u30a6\u30f3\u30c6\u30f3\u30d3\u30e5\u30fc'})
Versions:
In [39]: reader.metadata()
Out[39]: maxminddb.reader.Metadata(binary_format_major_version=2, description={u'en': u'GeoLite2 City database'}, record_size=28, database_type=u'GeoLite2-City', languages=[u'de', u'en', u'es', u'fr', u'ja', u'pt-BR', u'ru', u'zh-CN'], build_epoch=1438796457, ip_version=6, node_count=3199926, binary_format_minor_version=0)
In [40]: geoip2.__version__
Out[40]: '2.2.0'
Is this because I'm using Lite version?
Geoip location is only somewhat accurate.
Providers like MaxMind do their best to understand what IP address is associated with what geo location. However, that is a daunting task. IP addresses can be reassigned by the company that controls them, some companies do not publish the geography associated with an address, the IP you observe might belong to a proxy server far from the actual user, and there can be errors compiling the data.
Since their online system returns the correct geo location, this is probably an example of that final category.
In working extensively with geo location and correlating it to known facts about users, I observe that geo location databases are accurate around 85% - 90% of the time. Some providers do more than others to correctly handle the harder-to-handle IP addresses, but none of them are perfect.
If GeoIP returns the correct result and GeoLite does not, then yes, you're likely seeing the impact of the degraded accuracy of GeoLite. It's really a question of "do you want to pay, and if so, how much?"
Bear in mind that they recently introduced a third-level "Precision" service offering, of which the City database is itself now a degraded version.