Is it possible to get all messages history or get messages count from any chat in telegram - api

I need to calculate messages count in telegram or get messages history.
Tell me, is it possible to do this?
I know, that i can see messages count in chat from some member in telegram desktop. Maybe i can do this in any conversation?
Thank you!

You can count messages in telegram, get messages history and do plenty of other things with Telegram API. Here is a brilliant article that describes the process step by step: https://towardsdatascience.com/introduction-to-the-telegram-api-b0cd220dbed2
It worked like a charm for me both for retrieving messages and counting them. Let's look at the code from the article:
counts = {}
# create dictionary of ids to users and chats
users = {}
chats = {}
for u in dialogs.users:
users[u.id] = u
for c in dialogs.chats:
chats[c.id] = c
for d in dialogs.dialogs:
peer = d.peer
if isinstance(peer, PeerChannel):
id = peer.channel_id
channel = chats[id]
access_hash = channel.access_hash
name = channel.title
input_peer = InputPeerChannel(id, access_hash)
elif isinstance(peer, PeerChat):
id = peer.chat_id
group = chats[id]
name = group.title
input_peer = InputPeerChat(id)
elif isinstance(peer, PeerUser):
id = peer.user_id
user = users[id]
access_hash = user.access_hash
name = user.first_name
input_peer = InputPeerUser(id, access_hash)
else:
continue
get_history = GetHistoryRequest(
peer=input_peer,
offset_id=0,
offset_date=None,
add_offset=0,
limit=1,
max_id=0,
min_id=0,
)
history = client(get_history)
if isinstance(history, Messages):
count = len(history.messages)
else:
count = history.count
counts[name] = count
print(counts)
Let's add sorting:
sorted_counts = sorted(counts.items(), key=lambda x: x[1], reverse=True)
for name, count in sorted_counts:
print('{}: {}'.format(name, count))
We got messages counter result:
Group chat 1: 10000
Group chat 2: 3003
Channel 1: 2000
Chat 1: 1500
Chat 2: 300
P.S. Here is a simple (user's) way to get some chats from official wiki, but from my perspective it's very limited and unsuitable for programmatic purposes:
https://telegram.wiki/general/exporting-chats

Related

Geocoding, iterrows() and itertuples do not get the job done for a larger DataFrame

Im trying to add coördinates to a set of addresses that are saved in an excel file using the google geocoder API. See code below:
for i, row in df.iterrows():
#below combines the address columns together in one variable, to push to the geocoder API.
apiAddress = str(df.at[i, 'adresse1']) + ',' + str(df.at[i, 'postnr']) + ',' + str(df.at[i, 'By'])
#below creates a dictionary with the API key and the address info, to push to the Geocoder API on each iteration
parameters = {
'key' : API_KEY,
'address' : apiAddress
}
#response from the API, based on the input url + the dictionary above.
response = requests.get(base_url, params = parameters).json()
#when you look at the response, it is given as a dictionary. with this command I access the geometry part of the dictionary.
geometry = response['results'][0]['geometry']
#within the geometry party of the dictionary given by the API, I access the lat and lng respectively.
lat = geometry['location']['lat']
lng = geometry['location']['lng']
#here I append the lat / lng to a new column in the dataframe for each iteration.
df.at[i, 'Geo_Lat_New'] = lat
df.at[i, 'Geo_Lng_New'] = lng
#printing the first 10 rows.
print(df.head(10))
the above code works perfectly fine for 20 addresses. But when I try to run it on the entire dataset of 90000 addresses; using iterrows() I get a IndexError:
File "C:\Users\...", line 29, in <module>
geometry = response['results'][0]['geometry']
IndexError: list index out of range
Using itertuples() instead, with:
for i, row in df.itertuples():
I get a ValueError:
File "C:\Users\...", line 22, in <module>
for i, row in df.itertuples():
ValueError: too many values to unpack (expected 2)
when I use:
for i in df.itertuples():
I get a complicated KeyError. That is to long to put here.
Any suggestions on how to properly add coördinates for each address in the entire dataframe?
Update, in the end I found out what the issue was. The google geocoding API only handles 50 request per second. Therefore I used to following code to take a 1 second break after every 49 requests:
if count == 49:
print('Taking a 1 second break, total count is:', total_count)
time.sleep(1)
count = 0
Where count keeps count of the number of loops, as soon as it hits 49, the IF statement above is executed, taking a 1 second break and resetting the count back to zero.
Although you have already found the error - Google API limits the amount of requests that can be done - it isn't usually good practice to use for with pandas. Therefore, I would re write your code to take advantage of pd.DataFrame.apply.
def get_geometry(row: pd.Series, API_KEY: str, base_url: str, tries: int = 0):
apiAddress = ",".join(row["adresse1"], row["postnr"], row["By"])
parameters = {"key": API_KEY, "address": apiAddress}
try:
response = requests.get(base_url, params = parameters).json()
geometry = response["results"][0]["geometry"]
except IndexError: # reach limit
# sleep to make the next 50 requests, but
# beware that consistently reaching limits could
# further limit sending requests.
# this is why you might want to keep track of how
# many tries you have already done, as to stop the process
# if a threshold has been met
if tries > 3: # tries > arbitrary threshold
raise
time.sleep(1)
return get_geometry(row, API_KEY, base_url, tries + 1)
else:
geometry = response["results"][0]["geometry"]
return geometry["location"]["lat"], geometry["location"]["lng"]
# pass kwargs to apply function and iterate over every row
lat_lon = df.apply(get_geometry, API_KEY = API_KEY, base_url = base_url, axis = 1)
df["Geo_Lat_New"] = lat_lon.apply(lambda latlon: latlon[0])
df["Geo_Lng_New"] = lat_lon.apply(lambda latlon: latlon[1])

Store targets as collections that handle logic operation

I think my title is kinda unclear but I don't konw how to tell that otherwise.
My problem is:
We have users that belong to groups, there are many types of groups and any user belong to exaclty one group for each type.
Example: With group types A, B and C, containing respectively the groups (A1; A2; A3), (B1; B2) and (C1; C2; C3)
Every User must have a list of groups like [A1, B1, C1] or [A1, B2, C3] but never [A1, A2, B1] or [A1, C2]
We have messages that target to certain groups but not just a union, it can be more complex collection operations
Example: we can have message intended to [A1, B1, C3], [A1, *, *], [A1|A2, *, *] or even like ([A1, B1, C2] | [A2, B2, C1])
(* = any group of the type, | = or)
Messages are stored in a SQL DB, and users can retrieve all messages intended to their groups
How may I store messages and make my Query to reproduce this behavior ?
An option could be to encode both the user groups and the message targets in a (big) integer built on the powers of 2, and then base your query on a bitwise AND between user group code and message target code.
The idea is, group 1 is 1, group 2 is 2, group 3 is 4 and so on.
Level 1:
Assumptions:
you know in advance how many group types you have, and you have very few of them
you don't have more than 64 groups per type (assuming you work with 64-bit integers)
the message has only one target: A1|A2,B..,C... is ok, A*,B...,C... is ok, (A1,B1,C1)|(A2,B2,C2) is not.
Solution:
Encode each user group as the corresponding power of 2
Encode each message target as the sum of the allowed values: if groups 1 and 3 are allowed (A1|A3) the code will be 1+4=5, if all groups are allowed (A*) the code will be 2**64-1
you will have a User table and a Message table, and both will have one field for each group type code
The query will be WHERE (u.g1 & m.g1) * (u.g2 & m.g2) * ... * (u.gN & m.gN) <> 0
Level 2:
Assumptions:
you have some more group types, and/or you don't know in advance how many they are, or how they are composed
you don't have more than 64 groups in total (e.g. 10 for the first type, 12 for the second, ...)
the message still has only one target as above
Solution:
encode each user group and each message target as a single integer, taking care of the offset: if the first type has 10 groups they will be encoded from 1 to 1023 (2**10-1), then if the second type has 12 groups they will go from 1024 (2**10) to 4194304 (2**(10+12)-1), and so on
you will still have a User table and a Message table, and both will have one single field for the cumulative code
you will need to define a function which is able to check the user group vs the message target separately by each range; this can be difficult to do in SQL, and depends on which engine you are using
following is a Python implementation of both the encoding and the check
class IdEncoder:
def __init__(self, sizes):
self.sizes = sizes
self.grouplimits = {}
offset = 0
for i,size in enumerate(sizes):
self.grouplimits[i] = (2**offset, 2**(offset + size)-1)
offset += size
def encode(self, vals):
n = 0
for i, val in enumerate(vals):
if val == '*':
g = self.grouplimits[i][1] - self.grouplimits[i][0] + 1
else:
svals = val.split('|')
g = 0
for sval in svals:
g += 2**(int(sval)-1)
if i > 0:
g *= self.grouplimits[i][0]
print(g)
n += g
return n
def check(self, user, message):
res = False
for i,size in enumerate(self.sizes):
if user%2**size & message%2**size == 0:
break
if i < len(self.sizes)-1:
user >>= size
message >>= size
else:
res = True
return res
c = IdEncoder([10,12,10])
m3 = c.encode(['1|2','*','*'])
u1 = c.encode(['1','1','1'])
c.check(u1,m3)
True
u2=c.encode(['4','1','1'])
c.check(u2,m3)
False
Level 3:
Assumptions:
you adopt one of the above solutions, but you need multiple targets for each message
Solution:
You will need a third table, MessageTarget, containing the target code fields as above and a FK linking to the message
The query will search for all the MessageTarget rows compatible with the User group code(s) and show the related Message data
So you have 3 main tables:
Messages
Users
Groups
You then create 2 relationship tables:
Message-Group
User-Group
If you want to limit users to have access to just "their" messages then you join:
User > User-Group > Message-Group > Message

Loading multiple GTFS.zip format files in r5r

I am trying to analyze trips in a long time period in r5r that requires more than one GTFS files. I am using a for loop since I want to study trips in various depature dates in the Excel file. Right now, I have placed all three GTFS.zip files with different names in the data path together, but I could only receive mode information by public transportation within one date range, while trips in the other two dates produced walk time only. Is there a way to let r5r include all of them?
options(java.parameters = "-Xmx16G")
library(r5r)
library(sf)
library(data.table)
File_Path = file.path("C:","Research", "Data Sep", fsep = .Platform$file.sep)
list.files(File_Path)
poi <- fread(file.path(File_Path, "OriginsDestinationsPugetSound.csv"))
r5r_core <- setup_r5(data_path = File_Path, verbose = TRUE)
mode <- c("WALK","TRANSIT")
max_walk_dist <- 1000 # in meters
max_trip_duration <- 300 # in minutes
LengthOfFile = length(poi[[1]])
ListOfDetailedItineries = (matrix(ncol = 15,nrow = 0))
start = 1
end = 25
for (i in start:end) {
OriginPoint = poi[i,2:4]
DestinationPoint = poi[i,5:7]
Time_of_Trip = poi[i,9]
departure_datetime = as.POSIXct(Time_of_Trip[[1]], format = "%m/%d/%Y %H:%M")
dit <- detailed_itineraries(r5r_core = r5r_core,
origins = OriginPoint,
destinations = DestinationPoint,
mode = mode,
departure_datetime = departure_datetime,
max_walk_dist = max_walk_dist,
max_trip_duration = max_trip_duration,
shortest_path = TRUE,
verbose = TRUE)
ListOfDetailedItineries = rbind(ListOfDetailedItineries, as.matrix(dit))
cat('On iteration ',i,'\n',dit[[9]],"\n")
flush.console()
}
dit1 = as.data.frame(ListOfDetailedItineries)
As far as I understood, you have 3 feeds inside your data directory and you tried to generate travel time estimates for 3 different departure times, but only one of them return transit public transport trips. Is that correct?
If that's the case, you have to make sure that the public transit services listed in your GTFS feeds run on your specified departure times. This information is usually listed in the calendar table, but can also be listed in the calendar_dates table in some feeds.
The best practice here would be to choose dates that fall inside the service intervals of all 3 of your feeds. Alternatively, you can edit the start_date/end_date columns of their calendar table to include the days you have already chosen.

kannel receiving delivery issue

I'm using the following configuration on my kannel on 3 gateways, each gateway contains 2 sessions, one for sending and the other for receiving, the below is for receiving delivery status.
I have no issue with all of them till 3 months before one of them not able to pull the status, at the same time the same gateways is connected with more than 6000 clients with no issue at all.
The provider asked me to change the register_dlr=1
Any idea?
interface-version = 34
host = xx.xx.xx.xx
port = 0
receive-port = 8899
smsc-username = user
smsc-password = pass
system-type = VMA
source-addr-ton = 5
source-addr-npi = 1
dest-addr-ton = 0
dest-addr-npi = 0
keepalive = 600
reconnect-delay = 3
enquire-link-interval = 30
esm-class = 0
msg-id-type = 0x01

Google Spreadsheet Python API read specific column

I have 2 questions regarding google spreadsheet's api using python. My google spreadsheet is as follows:
a b1 23 4 5 6
When I run the script below I only get
root#darkbox:~/google_py# python test.py
1
2
3
4
I only want to get column 1 so i want to see
1
3
5
my second issue here is considering there is a space between the rows my script is not getting the second part (it should be 5 in this case)
How can I get the specified column and ignore white spaces?
#!/usr/bin/env python
import gdata.docs
import gdata.docs.service
import gdata.spreadsheet.service
import re, os
email = 'xxxx#gmail.com'
password = 'passw0rd'
spreadsheet_key = '14cT5KKKWzup1jK0vc-TyZt6BBwSIyazZz0sA_x0M1Bg' # key param
worksheet_id = 'od6' # default
#doc_name = 'python_test'
def main():
client = gdata.spreadsheet.service.SpreadsheetsService()
client.debug = False
client.email = email
client.password = password
client.source = 'test client'
client.ProgrammaticLogin()
q = gdata.spreadsheet.service.DocumentQuery()
feed = client.GetSpreadsheetsFeed(query=q)
feed = client.GetWorksheetsFeed(spreadsheet_key)
rows = client.GetListFeed(spreadsheet_key,worksheet_id).entry
for row in rows:
for key in row.custom:
print "%s" % (row.custom[key].text)
return
if __name__ == '__main__':
main()
To ignore white spaces:
Suggest you switch to CellFeed - I think list feed stops reading when it hits whitespace. Sorry, I forget the fine details. But I dropped List feed and switched to cellfeed a long time ago.