Related
I have a websocket connection to binance in my script. The websocket runs forever as usual. I got each pair's output as seperate outputs for my multiple stream connection.
for example here is the sample output:
{'stream': 'reefusdt#kline_1m', 'data': {'e': 'kline', 'E': 1651837066242, 's': 'REEFUSDT', 'k': {'t': 1651837020000, 'T': 1651837079999, 's': 'REEFUSDT', 'i': '1m', 'f': 95484416, 'L': 95484538, 'o': '0.006620', 'c': '0.006631', 'h': '0.006631', 'l': '0.006619', 'v': '1832391', 'n': 123, 'x': False, 'q': '12138.640083', 'V': '930395', 'Q': '6164.398584', 'B': '0'}}}
{'stream': 'ethusdt#kline_1m', 'data': {'e': 'kline', 'E': 1651837066253, 's': 'ETHUSDT', 'k': {'t': 1651837020000, 'T': 1651837079999, 's': 'ETHUSDT', 'i': '1m', 'f': 1613620941, 'L': 1613622573, 'o': '2671.86', 'c': '2675.79', 'h': '2675.80', 'l': '2671.81', 'v': '1018.530', 'n': 1633, 'x': False, 'q': '2723078.35891', 'V': '702.710', 'Q': '1878876.68612', 'B': '0'}}}
{'stream': 'ancusdt#kline_1m', 'data': {'e': 'kline', 'E': 1651837066257, 's': 'ANCUSDT', 'k': {'t': 1651837020000, 'T': 1651837079999, 's': 'ANCUSDT', 'i': '1m', 'f': 10991664, 'L': 10992230, 'o': '2.0750', 'c': '2.0810', 'h': '2.0820', 'l': '2.0740', 'v': '134474.7', 'n': 567, 'x': False, 'q': '279289.07500', 'V': '94837.8', 'Q': '197006.89950', 'B': '0'}}}
is there a way to edit this output like listed below. Main struggle is each one of the outputs are different dataframes. I want to merge them into one single dataframe. Output comes as a single nested dict which has two columns: "stream" and "data". "Data" has 4 columns in it and the last column "k" is another dict of 17 columns. I somehow managed to get only "k" in it:
json_message = json.loads(message)
result = json_message["data"]["k"]
and sample output is:
{'t': 1651837560000, 'T': 1651837619999, 's': 'CTSIUSDT', 'i': '1m', 'f': 27238014, 'L': 27238039, 'o': '0.2612', 'c': '0.2606', 'h': '0.2613', 'l': '0.2605', 'v': '17057', 'n': 26, 'x': False, 'q': '4449.1499', 'V': '3185', 'Q': '831.2502', 'B': '0'}
{'t': 1651837560000, 'T': 1651837619999, 's': 'ETCUSDT', 'i': '1m', 'f': 421543741, 'L': 421543977, 'o': '27.420', 'c': '27.398', 'h': '27.430', 'l': '27.397', 'v': '2988.24', 'n': 237, 'x': False, 'q': '81936.97951', 'V': '1848.40', 'Q': '50688.14941', 'B': '0'}
{'t': 1651837560000, 'T': 1651837619999, 's': 'ETHUSDT', 'i': '1m', 'f': 1613645553, 'L': 1613647188, 'o': '2671.38', 'c': '2669.95', 'h': '2672.38', 'l': '2669.70', 'v': '777.746', 'n': 1636, 'x': False, 'q': '2077574.75281', 'V': '413.365', 'Q': '1104234.98707', 'B': '0'}
I want to merge these outputs into a single dataframe of 6 columns and (almost 144 rows) which is closer to ss provided below. The only difference is my code creates different dataframes for each output.
Create a list of your messages. Your messages list should be like below:
message_list = [message1,message2,message3]
df = pd.DataFrame()
for i in range(len(message_list)):
temp_df = pd.DataFrame(message_list[i], index=[i,])
df = df.append(temp_df, ignore_index = True)
print(df)
t T s i f L o c h l v n x q V Q B
0 1651837560000 1651837619999 CTSIUSDT 1m 27238014 27238039 0.2612 0.2606 0.2613 0.2605 17057 26 False 4449.1499 3185 831.2502 0
1 1651837560000 1651837619999 ETCUSDT 1m 421543741 421543977 27.420 27.398 27.430 27.397 2988.24 237 False 81936.97951 1848.40 50688.14941 0
2 1651837560000 1651837619999 ETHUSDT 1m 1613645553 1613647188 2671.38 2669.95 2672.38 2669.70 777.746 1636 False 2077574.75281 413.365 1104234.98707 0
You can manipulate the dataframe later as needed.
I have the two following codes with their output. One is done in graph objects and the other using plotly express. As you can see, the one in ‘go’ doesn’t have a legend, and the one in ‘px’ doesn’t have individual column width. So how can I either get a legend for the first one, or fix the width in the other?
import plotly.graph_objects as go
import pandas as pd
df = pd.DataFrame({'PHA': [451, 149, 174, 128, 181, 175, 184, 545, 131, 106, 1780, 131, 344, 624, 236, 224, 178, 277, 141, 171, 164, 410],
'PHA_cum': [451, 600, 774, 902, 1083, 1258, 1442, 1987, 2118, 2224, 4004, 4135, 4479, 5103, 5339, 5563, 5741, 6018, 6159, 6330, 6494, 6904],
'trans_cost_cum': [0.14, 0.36, 0.6, 0.99, 1.4, 2.07, 2.76, 3.56, 4.01, 4.5, 5.05, 5.82, 5.97, 6.13, 6.33, 6.53, 6.65, 6.77, 6.9, 7.03, 7.45, 7.9],
'Province': ['East', 'East', 'East', 'East', 'East', 'Lapland', 'Lapland', 'Lapland', 'Oulu', 'Oulu', 'Oulu', 'Oulu', 'South', 'South', 'South', 'South', 'West', 'West', 'West', 'West', 'West', 'West'],
})
col_list = {'South': 'rgb(222,203,228)',
'West': 'rgb(204,235,197)',
'East': 'rgb(255,255,204)',
'Oulu': 'rgb(179,205,227)',
'Lapland': 'rgb(254,217,166)'}
provs = df['Province'].to_list()
colors = [col_list.get(item, item) for item in provs]
fig = go.Figure(data=[go.Bar(
x=df['PHA_cum']-df['PHA']/2,
y=df['trans_cost_cum'],
width=df['PHA'],
marker_color=colors
)])
fig.show()
import plotly.express as px
import pandas as pd
df = pd.DataFrame({'PHA': [451, 149, 174, 128, 181, 175, 184, 545, 131, 106, 1780, 131, 344, 624, 236, 224, 178, 277, 141, 171, 164, 410],
'PHA_cum': [451, 600, 774, 902, 1083, 1258, 1442, 1987, 2118, 2224, 4004, 4135, 4479, 5103, 5339, 5563, 5741, 6018, 6159, 6330, 6494, 6904],
'trans_cost_cum': [0.14, 0.36, 0.6, 0.99, 1.4, 2.07, 2.76, 3.56, 4.01, 4.5, 5.05, 5.82, 5.97, 6.13, 6.33, 6.53, 6.65, 6.77, 6.9, 7.03, 7.45, 7.9],
'Province': ['East', 'East', 'East', 'East', 'East', 'Lapland', 'Lapland', 'Lapland', 'Oulu', 'Oulu', 'Oulu', 'Oulu', 'South', 'South', 'South', 'South', 'West', 'West', 'West', 'West', 'West', 'West'],
})
fig = px.bar(df,
x=df['PHA_cum']-df['PHA']/2,
y=df['trans_cost_cum'],
color="Province",
color_discrete_sequence=px.colors.qualitative.Pastel1
)
fig.show()
Using graph_objects you'll need to pass in each Province as a trace in order for the legend to populate. See below, the only real change is looping through the data per Province.
df = pd.DataFrame({'PHA': [451, 149, 174, 128, 181, 175, 184, 545, 131, 106, 1780, 131, 344, 624, 236, 224, 178, 277, 141, 171, 164, 410],
'PHA_cum': [451, 600, 774, 902, 1083, 1258, 1442, 1987, 2118, 2224, 4004, 4135, 4479, 5103, 5339, 5563, 5741, 6018, 6159, 6330, 6494, 6904],
'trans_cost_cum': [0.14, 0.36, 0.6, 0.99, 1.4, 2.07, 2.76, 3.56, 4.01, 4.5, 5.05, 5.82, 5.97, 6.13, 6.33, 6.53, 6.65, 6.77, 6.9, 7.03, 7.45, 7.9],
'Province': ['East', 'East', 'East', 'East', 'East', 'Lapland', 'Lapland', 'Lapland', 'Oulu', 'Oulu', 'Oulu', 'Oulu', 'South', 'South', 'South', 'South', 'West', 'West', 'West', 'West', 'West', 'West'],
})
col_list = {'South': 'rgb(222,203,228)',
'West': 'rgb(204,235,197)',
'East': 'rgb(255,255,204)',
'Oulu': 'rgb(179,205,227)',
'Lapland': 'rgb(254,217,166)'}
#provs = df['Province'].to_list()
#colors = [col_list.get(item, item) for item in provs]
fig = go.Figure()
for p in df['Province'].unique():
dat = df[df.Province == p]
fig.add_trace(go.Bar(
name = p,
x=dat['PHA_cum']-dat['PHA']/2,
y=dat['trans_cost_cum'],
width=dat['PHA'],
marker_color= col_list[p]
))
fig.show()
I have two dataframes df1 and df2.
df1 = pd.DataFrame({
'Col1': ['abc', 'qrt', 'xyz', 'xam', 'asc', 'yat'],
'Col2': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
'Col3': [100, 120, 130, 200, 190, 210],})
df2 = pd.DataFrame({
'Col1': ['abc', 'qrt', 'xyz', 'mas', 'apc', 'ywt'],
'Col2': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
'Col4': [120, 140, 120, 200, 190, 210],})
I do an outerjoin on the two dataframes:
df = pd.merge(df1, df2[['Col1', 'Col4']], on= 'Col1', how='outer')
I get a new dataframe but I don't get the entries for Col2 for df2. I get
df = pd.DataFrame({
'Col1': ['abc', 'qrt', 'xyz', 'xam', 'asc', 'yat', 'mas', 'apc', 'ywt'],
'Col2': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses', 'NaN', 'NaN', 'NaN'],
'Col3': [100, 120, 130, 200, 190, 210, 'NaN', 'NaN', 'NaN'],
'Col4': [120, 140, 120, 'NaN', 'NaN', 'NaN', '200', '190', '210']})
But what I want is:
df = pd.DataFrame({
'Col1': ['abc', 'qrt', 'xyz', 'xam', 'asc', 'yat', 'mas', 'apc', 'ywt'],
'Col2': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
'Col3': [100, 120, 130, 200, 190, 210, 'NaN', 'NaN', 'NaN'],
'Col4': [120, 140, 120, 'NaN', 'NaN', 'NaN', '200', '190', '210']})
I want to have the entries for Col2 from df2 as new rows in the merged dataframe
I am scraping one particular page with the a headless chromedriver
The page is really huge, to load it entirely I need 10k+ clicks on a lazy load button
The more I click, the slower things get
Is there a way to make the process faster?
Here is the code:
def driver_config():
chrome_options = Options()
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
chrome_options.page_load_strategy = 'eager'
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)
return(driver)
def scroll_the_category_until_the_end(driver, category_url):
driver.get(category_url)
pbar = tqdm()
pbar.write('initializing spin')
while True:
try:
show_more_button = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[#id="root"]/div/div[2]/div[2]/div[2]/button')))
driver.execute_script("arguments[0].click();", show_more_button)
pbar.update()
except TimeoutException:
pbar.write('docking')
pbar.close()
break
driver = driver_config()
scroll_the_category_until_the_end(driver, 'https://supl.biz/russian-federation/stroitelnyie-i-otdelochnyie-materialyi-supplierscategory9403/')
UPDATE:
I also tried to implement another strategy but it didn't work:
deleting all company information on every iteration
clearing driver cash
My hypothesis was that if I do this, DOM will always be clean and fast
driver = driver_config()
driver.get('https://supl.biz/russian-federation/stroitelnyie-i-otdelochnyie-materialyi-supplierscategory9403/')
pbar = tqdm()
pbar.clear()
while True:
try:
for el in driver.find_elements_by_class_name('a_zvOKG8vZ'):
driver.execute_script("""var element = arguments[0];element.parentNode.removeChild(element);""", el)
button = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH,"//*[contains(text(), 'Показать больше поставщиков')]")))
driver.execute_script("arguments[0].click();", button)
pbar.update()
driver.execute_script("window.localStorage.clear();")
except Exception as e:
pbar.close()
print(e)
break
First the website invokes javascript to grab new data, the HTTP request is invoked by clicking the more results button, it calls upon an API and the response back is the data needed to load the page with more results. You can view this request by inspecting the page --> Network tools --> XHR and then clicking the button. It sends an HTTP GET request to an API which has data on each product.
The most efficient way to grab data from a website that invokes javascript is by re-engineering this HTTP request the javascript is making.
In this case it's relatively easy, I copied the request in a cURL command within XHR of inspecting the page and converted it using curl.trillworks.com to python.
This is the screen you get to with XHR, before clicking the more results page.
Clicking the more results page you get this, notice how a request has populated the screen ?
Here I'm copying the cURL request to grab the necessary headers etc...
Here I'm copying the cURL request to grab the necessary headers etc... you can then input this into curl.trillworks.com and it converts the request into params, cookies and headers and gives you boilerplate for the requests package.
Had a play around with the request using the requests package. Inputting various parts of the headers, you are provided cookies, but they're actually not necessary when you make the request.
The simplest request to make is one without headers, parameters or cookies but most API endpoints don't accept this. In this case, having played around with the requests package, you need a user-agent and the parameters that specify what data you get back from the API. Infact you don't even need a valid user-agent.
Now you could invoke a while loop to keep making HTTP requests in sizes of 8. Unfortunately altering the size of the request in the parameters doesn't get you all the data!
Coding Example
import requests
import time
i = 8
j = 1
headers = {
'user-agent': 'M'
}
while True:
if response.status_code == 200:
params = (
('category', '9403'),
('city', 'russian-federation'),
('page', f'{j}'),
('size', f'{i}'),
)
response = requests.get('https://supl.biz/api/monolith/suppliers-catalog/search/', headers=headers, params=params)
print(response.json()['hits'][0])
i += 8
j += 1
time.sleep(4)
else:
break
Output
Sample output
{'id': 1373827,
'type': None,
'highlighted': None,
'count_days_on_tariff': 183,
'tariff_info': {'title_for_show': 'Поставщик Премиум',
'finish_date': '2021-02-13',
'url': '/supplier-premium-membership/',
'start_date': '2020-08-13'},
'origin_ru': {'id': 999, 'title': 'Санкт-Петербург'},
'title': 'ООО "СТАНДАРТ 10"',
'address': 'Пискаревский проспект, 150, корпус 2.',
'inn': '7802647317',
'delivery_types': ['self', 'transportcompany', 'suppliercars', 'railway'],
'summary': 'Сэндвич-панели: новые, 2й сорт, б/у. Холодильные камеры: новые, б/у. Двери для холодильных камер: новые, б/у. Строительство холодильных складов, ангаров и др. коммерческих объектов из сэндвич-панелей. Холодильное оборудование: новое, б/у.',
'phone': '79219602762',
'hash_id': 'lMJPgpEz7b',
'payment_types': ['cache', 'noncache'],
'logo_url': 'https://suplbiz-a.akamaihd.net/media/cache/37/9e/379e9fafdeaab4fc5a068bc90845b56b.jpg',
'proposals_count': 4218,
'score': 42423,
'reviews': 0,
'rating': '0.0',
'performed_orders_count': 1,
'has_replain_chat': False,
'verification_status': 2,
'proposals': [{'id': 20721916,
'title': 'Сэндвич панели PIR 100',
'description': 'Сэндвич панели. Наполнение Пенополиизлцианурат ПИР PIR. Толщина 100мм. Длина 3,2 метра. Rall9003/Rall9003. Вналичии 600м2. Количество: 1500',
'categories': [135],
'price': 1250.0,
'old_price': None,
'slug': 'sendvich-paneli-pir-100',
'currency': 'RUB',
'price_details': 'Цена за шт.',
'image': {'preview_220x136': 'https://suplbiz-a.akamaihd.net/media/cache/72/4d/724d0ba4d4a2b7d459f3ca4416e58d7d.jpg',
'image_dominant_color': '#ffffff',
'preview_140': 'https://suplbiz-a.akamaihd.net/media/cache/67/45/6745bb6f616b82f7cd312e27814b6b89.jpg',
'hash': 'd41d8cd98f00b204e9800998ecf8427e'},
'additional_images': [],
'availability': 1,
'views': 12,
'seo_friendly': False,
'user': {'id': 1373827,
'name': 'ООО "СТАНДАРТ 10"',
'phone': '+79219602762',
'address': 'Пискаревский проспект, 150, корпус 2.',
'origin_id': 999,
'country_id': 1,
'origin_title': 'Санкт-Петербург',
'verified': False,
'score': 333,
'rating': 0.0,
'reviews': 0,
'tariff': {'title_for_show': 'Поставщик Премиум',
'count_days_on_tariff': 183,
'finish_date': '2021-02-13',
'url': '/supplier-premium-membership/',
'start_date': '2020-08-13'},
'performed_orders_count': 1,
'views': 12,
'location': {'lon': 30.31413, 'lat': 59.93863}}},
{'id': 20722131,
'title': 'Сэндвич панели ппу100 б/у, 2,37 м',
'description': 'Сэндвич панели. Наполнение Пенополиуретан ППУ ПУР PUR. Толщина 100 мм. длинна 2,37 метра. rall9003/rall9003. БУ. В наличии 250 м2.',
'categories': [135],
'price': 800.0,
'old_price': None,
'slug': 'sendvich-paneli-ppu100-b-u-2-37-m',
'currency': 'RUB',
'price_details': 'Цена за шт.',
'image': {'preview_220x136': 'https://suplbiz-a.akamaihd.net/media/cache/d1/49/d1498144bc7b324e288606b0d7d98120.jpg',
'image_dominant_color': '#ffffff',
'preview_140': 'https://suplbiz-a.akamaihd.net/media/cache/10/4b/104b54cb9b7ddbc6b2f0c1c5a01cdc2d.jpg',
'hash': 'd41d8cd98f00b204e9800998ecf8427e'},
'additional_images': [],
'availability': 1,
'views': 4,
'seo_friendly': False,
'user': {'id': 1373827,
'name': 'ООО "СТАНДАРТ 10"',
'phone': '+79219602762',
'address': 'Пискаревский проспект, 150, корпус 2.',
'origin_id': 999,
'country_id': 1,
'origin_title': 'Санкт-Петербург',
'verified': False,
'score': 333,
'rating': 0.0,
'reviews': 0,
'tariff': {'title_for_show': 'Поставщик Премиум',
'count_days_on_tariff': 183,
'finish_date': '2021-02-13',
'url': '/supplier-premium-membership/',
'start_date': '2020-08-13'},
'performed_orders_count': 1,
'views': 4,
'location': {'lon': 30.31413, 'lat': 59.93863}}},
{'id': 20722293,
'title': 'Холодильная камера polair 2.56х2.56х2.1',
'description': 'Холодильная камера. Размер 2,56 Х 2,56 Х 2,1. Камера из сэндвич панелей ППУ80. Камера с дверью. -5/+5 или -18. В наличии. Подберем моноблок или сплит систему. …',
'categories': [478],
'price': 45000.0,
'old_price': None,
'slug': 'holodilnaya-kamera-polair-2-56h2-56h2-1',
'currency': 'RUB',
'price_details': 'Цена за шт.',
'image': {'preview_220x136': 'https://suplbiz-a.akamaihd.net/media/cache/c1/9f/c19f38cd6893a3b94cbdcbdb8493c455.jpg',
'image_dominant_color': '#ffffff',
'preview_140': 'https://suplbiz-a.akamaihd.net/media/cache/4d/b0/4db06a2508cccf5b2e7fe822c1b892a2.jpg',
'hash': 'd41d8cd98f00b204e9800998ecf8427e'},
'additional_images': [],
'availability': 1,
'views': 5,
'seo_friendly': False,
'user': {'id': 1373827,
'name': 'ООО "СТАНДАРТ 10"',
'phone': '+79219602762',
'address': 'Пискаревский проспект, 150, корпус 2.',
'origin_id': 999,
'country_id': 1,
'origin_title': 'Санкт-Петербург',
'verified': False,
'score': 333,
'rating': 0.0,
'reviews': 0,
'tariff': {'title_for_show': 'Поставщик Премиум',
'count_days_on_tariff': 183,
'finish_date': '2021-02-13',
'url': '/supplier-premium-membership/',
'start_date': '2020-08-13'},
'performed_orders_count': 1,
'views': 5,
'location': {'lon': 30.31413, 'lat': 59.93863}}},
{'id': 20722112,
'title': 'Сэндвич панели ппу 80 б/у, 2,4 м',
'description': 'Сэндвич панели. Наполнение ППУ. Толщина 80 мм. длинна 2,4 метра. БУ. В наличии 350 м2.',
'categories': [135],
'price': 799.0,
'old_price': None,
'slug': 'sendvich-paneli-ppu-80-b-u-2-4-m',
'currency': 'RUB',
'price_details': 'Цена за шт.',
'image': {'preview_220x136': 'https://suplbiz-a.akamaihd.net/media/cache/ba/06/ba069a73eda4641030ad69633d79675d.jpg',
'image_dominant_color': '#ffffff',
'preview_140': 'https://suplbiz-a.akamaihd.net/media/cache/4f/e9/4fe9f3f358f775fa828c532a6c08e7f2.jpg',
'hash': 'd41d8cd98f00b204e9800998ecf8427e'},
'additional_images': [],
'availability': 1,
'views': 8,
'seo_friendly': False,
'user': {'id': 1373827,
'name': 'ООО "СТАНДАРТ 10"',
'phone': '+79219602762',
'address': 'Пискаревский проспект, 150, корпус 2.',
'origin_id': 999,
'country_id': 1,
'origin_title': 'Санкт-Петербург',
'verified': False,
'score': 333,
'rating': 0.0,
'reviews': 0,
'tariff': {'title_for_show': 'Поставщик Премиум',
'count_days_on_tariff': 183,
'finish_date': '2021-02-13',
'url': '/supplier-premium-membership/',
'start_date': '2020-08-13'},
'performed_orders_count': 1,
'views': 8,
'location': {'lon': 30.31413, 'lat': 59.93863}}},
{'id': 20722117,
'title': 'Сэндвич панели ппу 60 мм, 2,99 м',
'description': 'Сэндвич панели. Наполнение Пенополиуретан ППУ ПУР PUR . Новые. В наличии 600 м2. Толщина 60 мм. длинна 2,99 метров. rall9003/rall9003',
'categories': [135],
'price': 1100.0,
'old_price': None,
'slug': 'sendvich-paneli-ppu-60-mm-2-99-m',
'currency': 'RUB',
'price_details': 'Цена за шт.',
'image': {'preview_220x136': 'https://suplbiz-a.akamaihd.net/media/cache/e2/fb/e2fb6505a5af74a5a994783a5e51600c.jpg',
'image_dominant_color': '#ffffff',
'preview_140': 'https://suplbiz-a.akamaihd.net/media/cache/9c/f5/9cf5905a26e6b2ea1fc16d50c19ef488.jpg',
'hash': 'd41d8cd98f00b204e9800998ecf8427e'},
'additional_images': [],
'availability': 1,
'views': 10,
'seo_friendly': False,
'user': {'id': 1373827,
'name': 'ООО "СТАНДАРТ 10"',
'phone': '+79219602762',
'address': 'Пискаревский проспект, 150, корпус 2.',
'origin_id': 999,
'country_id': 1,
'origin_title': 'Санкт-Петербург',
'verified': False,
'score': 333,
'rating': 0.0,
'reviews': 0,
'tariff': {'title_for_show': 'Поставщик Премиум',
'count_days_on_tariff': 183,
'finish_date': '2021-02-13',
'url': '/supplier-premium-membership/',
'start_date': '2020-08-13'},
'performed_orders_count': 1,
'views': 10,
'location': {'lon': 30.31413, 'lat': 59.93863}}}]}
Explanation
Here we're making sure that the response status is 200 before making another request. Using f-strings we change the page by 1 and the size of the results of the JSON object by 8 each iteration of the while loop. I've imposed a time restriction per request, because if push too many HTTP request at once you'll end up getting IP banned. Be gentle on the server!
The response.json() method converts the JSON object to python dictionary, you haven't specified what data, but I think if you can handle a python dictionary you can grab the data you require.
Comments
Here is where the parameters comes from. You can see the pages and size data here.
My dataframe:
data_part = [{'Part': 'A', 'Engine': True, 'TurboCharger': True, 'Restricted': True},
{'Part': 'B', 'Engine': False, 'TurboCharger': True, 'Restricted': False},]
My expect output is this:
{'A': {'Engine': 1, 'TurboCharger': 1, 'Restricted': 1},
'B': {'TurboCharger': 1}}
This is what I am doing:
df_part = pd.DataFrame(data_part).set_index('Part').astype(int).to_dict('index')
This is what it gives:
{'A': {'Engine': 1, 'TurboCharger': 1, 'Restricted': 1},
'B': {'Engine': 0, 'TurboCharger': 1, 'Restricted': 0}}
Anything that can be done to reach expected output?
We can fix your output
d=pd.DataFrame(data_part).set_index('Part').astype(int).stack().loc[lambda x : x!=0].reset_index('Part').groupby('Part').agg(dict)[0].to_dict()
Out[192]:
{'A': {'Engine': 1, 'TurboCharger': 1, 'Restricted': 1},
'B': {'TurboCharger': 1}}
You may call agg before to_dict
df_part = (pd.DataFrame(data_part).set_index('Part')
.agg(lambda x: dict(x[x].astype(int)), axis=1)
.to_dict())
Out[60]:
{'A': {'Engine': 1, 'Restricted': 1, 'TurboCharger': 1},
'B': {'TurboCharger': 1}}
Here's a way to convert the list to a dict without pandas:
from pprint import pprint
data_2 = dict()
for dp in data_part:
ts = [(k, v) for k, v in dp.items()]
key = ts[0][1]
values = {k: int(v) for k, v in ts[1:] if v}
data_2[key] = values
pprint(data_2)
{'A': {'Engine': 1, 'Restricted': 1, 'TurboCharger': 1},
'B': {'TurboCharger': 1}}