Loading dataset from external file system (https) directly into Spark dataframe - dataframe

I'm trying to load a CSV dataset directly from an external file system, but I'm getting a 401 Unauthorized response whenever I call sparkContext.addFile(). Is there a way to add authorization headers to the request before adding the file? Or a better way to load a csv file as a dataframe?
This is what I'm trying now and it throws an exception when I make the addFile() call.
import org.apache.spark.SparkFiles
spark.sparkContext.addFile(urlPath)
val df = spark.read
.option("header", true)
.csv("file://"+SparkFiles.get(urlPath))

Related

Parsing Requests Module Response JSON

I am trying to use the [OpenMeteo API][1] to download CSVs via script. Using their API Builder, and their click-able site, I am able to do it for a single site just fine.
In trying to write a function to accomplish this in a loop via the requests library, I am stumped. I CANNOT get the json data OUT of the type requests.model.Response and into a JSON or even better CSV/ Pandas format.
Cannot parse the requests module response to anything. It appears to succeed ("200" response), and have data, but I cannot get it out of the response format!
latitude= 60.358
longitude= -148.939
request_txt=r"https://archive-api.open-meteo.com/v1/archive?latitude="+ str(latitude)+"&longitude="+ str(longitude)+ "&start_date=1959-01-01&end_date=2023-02-10&models=era5_land&daily=temperature_2m_max,temperature_2m_min,temperature_2m_mean,shortwave_radiation_sum,precipitation_sum,rain_sum,snowfall_sum,et0_fao_evapotranspiration&timezone=America%2FAnchorage&windspeed_unit=ms"
r=requests.get(request_txt).json()
[1]: https://open-meteo.com/en/docs/historical-weather-api
The url needs format=csv added as a parameter. StringIO is used to download as an object into memory.
from io import StringIO
import pandas as pd
import requests
latitude = 60.358
longitude = -148.939
url = ("https://archive-api.open-meteo.com/v1/archive?"
f"latitude={latitude}&"
f"longitude={longitude}&"
"start_date=1959-01-01&"
"end_date=2023-02-10&"
"models=era5_land&"
"daily=temperature_2m_max,temperature_2m_min,temperature_2m_mean,"
"shortwave_radiation_sum,precipitation_sum,rain_sum,snowfall_sum,"
"et0_fao_evapotranspiration&"
"timezone=America%2FAnchorage&"
"windspeed_unit=ms&"
"format=csv")
with requests.Session() as request:
response = request.get(url, timeout=30)
if response.status_code != 200:
print(response.raise_for_status())
df = pd.read_csv(StringIO(response.text), sep=",", skiprows=2)
print(df)

How can I convert Telegram Session String from Telegram Session file

I'm trying to get the Session String from an existing Session File of Pyrogram. How can I do that?
Can you help me?
from dotenv import dotenv_values
from pyrogram import Client
config = dotenv_values(dotenv_path='./.env')
app = Client(
# name="withstring",
name="my_bot",
# api_id=config.get("API_ID"),
# api_hash=config.get("API_HASH"),
bot_token=config.get("BOT_TOKEN"),
)
with app:
app.send_message("username", text="Hello world Minhaz!")
s = app.export_session_string()
# print(s)
app.run()
The Session File is an sqlite database storing your authorization against the API and peers you've met (messages received, chats joined, etc).
To get the Session String to authenticate in Memory (losing peers when you log in again), you can just call the Client.export_session_string() method.
Edit to add: If you already have a session file, you can use its name to log in, instead of creating a new in-memory session. If you have a my_account.session file, use Client("my_session") when instantiating your Client.
from pyrogram import Client
app = Client(":memory:")
with app:
session = app.export_session_string()
print(session)

Is there a solution to uploading csv file to SQL

Anytime I tried uploading CSV file to Google Cloud Bigquery, I kept getting an error response. I tried Google drive to upload but it won't show the preview button on the table. I need help on how I can resolve this please.
You may want to try Loading CSV data from Cloud Storage. I used the following python code and I was able to load csv file to Bigquery successfully:
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table to create.
table_id = "your-project.your_dataset.your_table_name"
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("name", "STRING"),
bigquery.SchemaField("post_abbr", "STRING"),
],
skip_leading_rows=1,
# The source format defaults to CSV, so the line below is optional.
source_format=bigquery.SourceFormat.CSV,
)
uri = "gs://cloud-samples-data/bigquery/us-states/us-states.csv"
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
) # Make an API request.
load_job.result() # Waits for the job to complete.
destination_table = client.get_table(table_id) # Make an API request.
print("Loaded {} rows.".format(destination_table.num_rows))

Read csv from s3 and upload to external api as multipart

I want to read the csv file from the s3 bucket using boto3 and upload it to external API using multipart/form-data request.
so far I am able to read the csv
response = s3.get_object(Bucket=bucket, Key=key)
body = response['Body']
Not sure on how to convert this body into multipart.
External api will be taking request in multipart/form-data.
Any Suggestions would be helpful.
Following method solved my issue.
body = response['Body'].read()
multipart_data = MultipartEncoder(
fields={
'file': (file_name, body, 'application/vnd.ms-excel'),
'field01': 'test'
}
)
.read() method will convert the file into binary string.

How to POST a XML file in jmeter body instead of a physical file

I'm using JMeter 3.2.
My requirement is to read an XML file from the disk, replace some tags with dynamic values to ensure each thread sends a unique xml file upload (NOT SOAP Request). The following code in JSR223 sampler works perfectly fine when I try to upload the newfile through POST using a http sampler with ${newfilename} file text/xml.
import org.apache.commons.io.FileUtils;
try {
String content = FileUtils.readFileToString(new File("E:/test.xml"));
content = content.replaceAll("SUB_ID", "${__UUID}");
content = content.replaceAll("ABN_ID", "${empabn}");
content = content.replaceAll("EMPNAME", "${empname}");
vars.put("content", content);
FileUtils.writeStringToFile(new File("E:/testnew${empname}.xml"), content);
}
catch (Throwable ex) {
log.info("What happened?", ex);
throw ex;
}
Instead of writing again to the disk and uploading again, how can I send the contents of string 'content' as part of request body? I have looked at many posts that talk about the input output streams but they are confusing. When I try to send just ${content} in body, the application throws following error:
HTTP Status 500 - Could not write JSON: Name is null (through reference chain: com.xxx.xxx.datafile.rest.DataFileResponse["validationStatus"]); nested exception is com.fasterxml.jackson.databind.JsonMappingException: Name is null (through reference chain:
Appreciate your help.
Multipart POST requests which are being used for files upload are different from normal POST requests hence there no possibility to simply substitute the file with the generated in-memory string.
You need to replicate the request exactly as it would be send by JMeter or real browser and manually populate each part starting from defining boundary using the HTTP Header Manager and ending up with the creation of Content-Disposition and specify your file contents there.
A little hint: you don't need to generate/substitute values for each call, it is enough to replace them once and JMeter will substitute them on its own given you use __eval() and __FileToString() functions combination.
You can check out Testing REST API File Uploads in JMeter for an example of creation a relatively complex file upload request, in your case it will be easier but still tricky.