I am using premium account (not sandbox) for data collection.
I want to collect:
All tweets in English that contain ‘china’ or ‘chinese’ that are user geolocated to US and not geolocated at tweet level, excluding all retweets
All tweets in English that contain ‘china’ or ‘chinese’ that are user geolocated to ‘Minnesota’ and not geolocated at tweet level, excluding all retweets
The code is as follows:
premium_search_args = load_credentials('twitter_API.yaml',
yaml_key ='search_tweets_premium_api', env_overwrite=False)
# keywords for the search
# key word 1
keywords = '(China OR Chinese) lang:en profile_country:US -place_country:US -is:retweet'
# key word 2
keywords = '(China OR Chinese) lang:en -place_country:US profile_region:"Minnesota" -is:retweet'
# define search rule
rule = gen_rule_payload(keywords,from_date='2019-12-01',
to_date='2019-12-10',results_per_call=500)
# create result stream and print before start
rs = ResultStream(rule_payload=rule, max_results=1250000,
**premium_search_args)
My problems are that:
For the first one, a large portion of the results I get didn’t satisfy the query. First, some don’t have Profile Geo enrichment, i.e. user.derived.locations attribute is not in the user object. Second, if it is, a lot don’t have country code US, i.e. they are identified to other countries.
For the second one, the result I get from this method is a smaller subset of the results I can get from 1). That is, when I filter all tweets user geolocated to Minnesota (by user.derived.locations.region) from profile_country:US, it gives a larger sample than using profile_region:“Minnesota”. A considerable amount of data is missing using this method.
I have tried several times but it seems that user geolocation operators don’t work exactly what I want. Does anyone has any idea why this is the case? I would very much appreciate any answers/suggestions/comments.
Thank you!
Does search_tweets function from rtweet package has parameters that allow me to filter by geolocation, equivalent to search_tweets parameter "geocode = '40.757343,-73.848261,40km'"?
Not really, you need to download the tweets via rtweet::search_tweets and then impose a geographical cut on coordinates. Furthermore there is also a column entitled bounding_box_coordinates that can be proven useful to your analysis.
I use HL7 ORU message to send clinical notes. At present, I just send notes as they are created and saved. But now I need to support edit and delete of the notes and convey the same to the receiving system.
How can I achieve edit / delete with this? I use ORU^R01 structure and use OBR and multiple OBX segments for my information. Thanks.
You will need to confirm with the receiving system how they want edits and deletes conveyed to them. But it is common to use the result status code in OBR-25 and/or the observation result status code in OBX-11.
For example, if the clinical note is edited (aka. corrected or modified) send a C in OBR-25. If the clinical note is deleted send a X in OBR-25. Ultimately you will need to coordinate with the receiving system.
FWIW, I commonly see these values in OBR-25:
P = preliminary
F = final
C = corrected / modified
X = cancelled / deleted / in-error
I am using the Google Analytics API to automatically fetch stats from eccomerce sites. I need to query a dynamic segment with the sessions that spent more than 0 and less than 50USD in ecommerce.
I tried this:
segment=users::condition::perSession::ga:transactionRevenue>0;users::condition::perSession::ga:transactionRevenue<50
But it looks like the API is ignoring the ga:transactionRevenue < 50 condition, returning all the sessions with ga:transactionRevenue>0. I tried some other metrics in the > 0 condition ( like uniquePurchases , ga:transactionTax...) with the same results.
The fun part is that if I use transactionShipping it works OK ( returning the sessions with purchases including shipping costs and with less than 50USD revenue):
segment=users::condition::perSession::ga:transactionShipping >0;users::condition::perSession::ga:transactionRevenue<50
But this is not OK, because I need to include the free shippingtransactions on the segment.
Anybody nows a workarround for this?
Check how you are processing the segment for sending to the API.
The rule is that you should escape a semicolon in a value expression (\;). I suspect you are escaping the semicolon between the conditions by accident because you encode/escape everything after the 'segment=' like segment=<encoded/escaped segment definition>
What you need to to send segment=<encoded/escaped condition1>;<encoded/escaped condition2>
I'm getting wrong location when I query the GeoLite2-City.mmdb database with ip = '104.6.30.56' (from Python). Their demo site returns good data for this IP (https://www.maxmind.com/en/geoip-demo).
In [33]: import geoip2.database
In [34]: reader = geoip2.database.Reader('.../GeoLite2-City.mmdb')
In [35]: reader.city('104.6.30.56').city # should be Santa Rosa, Ca
Out[35]: geoip2.records.City(geoname_id=None, confidence=None, _locales=['en'], names={})
In [36]: reader.city('104.6.30.56').location # should be ~(38, -122)
Out[36]: geoip2.records.Location(postal_confidence=None, average_income=None, accuracy_radius=None, time_zone=None, longitude=-97.0, metro_code=None, population_density=None, postal_code=None, latitude=38.0)
In [37]: reader.city('173.194.116.131').city # works fine for Google
Out[37]: geoip2.records.City(geoname_id=5375480, confidence=None, _locales=['en'], names={u'ru': u'\u041c\u0430\u0443\u043d\u0442\u0438\u043d-\u0412\u044c\u044e', u'fr': u'Mountain View', u'en': u'Mountain View', u'de': u'Mountain View', u'zh-CN': u'\u8292\u5ef7\u7ef4\u5c24', u'ja': u'\u30de\u30a6\u30f3\u30c6\u30f3\u30d3\u30e5\u30fc'})
Versions:
In [39]: reader.metadata()
Out[39]: maxminddb.reader.Metadata(binary_format_major_version=2, description={u'en': u'GeoLite2 City database'}, record_size=28, database_type=u'GeoLite2-City', languages=[u'de', u'en', u'es', u'fr', u'ja', u'pt-BR', u'ru', u'zh-CN'], build_epoch=1438796457, ip_version=6, node_count=3199926, binary_format_minor_version=0)
In [40]: geoip2.__version__
Out[40]: '2.2.0'
Is this because I'm using Lite version?
Geoip location is only somewhat accurate.
Providers like MaxMind do their best to understand what IP address is associated with what geo location. However, that is a daunting task. IP addresses can be reassigned by the company that controls them, some companies do not publish the geography associated with an address, the IP you observe might belong to a proxy server far from the actual user, and there can be errors compiling the data.
Since their online system returns the correct geo location, this is probably an example of that final category.
In working extensively with geo location and correlating it to known facts about users, I observe that geo location databases are accurate around 85% - 90% of the time. Some providers do more than others to correctly handle the harder-to-handle IP addresses, but none of them are perfect.
If GeoIP returns the correct result and GeoLite does not, then yes, you're likely seeing the impact of the degraded accuracy of GeoLite. It's really a question of "do you want to pay, and if so, how much?"
Bear in mind that they recently introduced a third-level "Precision" service offering, of which the City database is itself now a degraded version.