GET API call returns non-text content - api

I am making an API call and reading the response in JSON
r = requests.get(query_url, headers=headers)
pprint(r.json())
But the 'content' is not in text format
'content': 'CmRlbGV0ZSBmcm9tICB7eyBwYXJhbXMuY3VyYXRlZF9kYXRhYmFzZV9uYW1l\n'
'IH19LmNybS5BRkZJTElBVEVfUFJJQ0lORzsKCklOU0VSVCBJTlRPICB7eyBw\n'
'TkcKICB3aGVyZSAxID0gMQogIAogIDsKICAKICAKCg==\n'
How do I convert the 'content' to text
For full context, I am trying to download code from the GitHub repo as text to store in our Database

Are you using python and in particular the requests package? If so, I think you could use r.text.
docs: https://pypi.org/project/requests/

Thanks to the article #CherryDT pointed me to, I was able to get the text
import base64
r_dict = r.json()
content = r_dict['content']
base64.b64decode(content)

Related

Web-scraping using python

I am trying to extract data from this website, It is almost impossible to scrape as after any search it's not changing its URL.
I want to search based on PUBLISHER IPI '00144443097' and extract all data they have insideclass="items-container".
My code
quote_page = 'https://portal.themlc.com/search'
page = urllib.request.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
name_box = soup.find('section', attrs={'class': 'items-container'})
name = name_box.text
print(name)
Here as the URL after search doesn't change it's not giving me any value.
After extracting values I want to sort them in pandas
When the url doesn't change, you can use the developer tools to see if an api is being called. In this case there are two apis. One gives basic information about the writer and the other gives the information on the works. You can parse the json response however you wish from here.
Note: this a post, not a get
url = 'https://api.ptl.themlc.com/api/search/writer?page=1&limit=10'
payload = {'writerIpi': "00144443097"}
requests.post(url, json=payload).json()
url = 'https://api.ptl.themlc.com/api/search/work?page=1&limit=10'
payload = {'writerIpi': "00144443097"}
requests.post(url, json=payload).json()
url = 'https://api.ptl.themlc.com/api/search/publisher?page=1&limit=10'
payload = {"publisherIpi":"00144443097"}
requests.post(url, json=payload).json()
# this url gets the 161 works for the publisheripid you want. it's convoluted, but you may be able to automate, but I used developer tools to find the right publisheripid
url = 'https://api.ptl.themlc.com/api/search/work?page=1&limit=10'
payload = {'publisherIpId': "7305902"}
requests.post(url, json=payload).json()
To find the publisheripid, you need to open some of works within the author and look for the work endpoint. hopefully this image loads correctly

GridFs read PDF

I am trying to build a financial dashboard with Flask and pymongo. The starting point is a flask form which saves data in a MongoDB database. One of the fields in the form is a FileField (wtforms) which allows the upload of a PDF, which is then stored in MongoDB with GridFS.
Now I manage to save the pdf and I can see the resulting entries within the .files and .chunks collections. Now I would like to build a function that retrieves the PDFs and analyses them with some basic NLP, however I struggle with the getting meaningful data.
When I do:
storage = gridfs.GridFS(db, collection)
data = storage.get('some id')
a = data.read()
The result is a binary file. If I continue with:
with open(data, 'rb') as f:
b = f.read()
The result is "ValueError: embedded null byte or sometimes an empty "byte string".
Any help on this?
To follow up on the above, I found a solution for myself that consists in 2 separate functions:
(1) Upon upload of the form and before uploading the files to MongoDB, I apply a function based on pdfminer that extracts the string content of the PDF and tranform it into a list of sentences using NLTK. I will then store this list in the .files via the storage.put(file, sent_list = sent_list) #sent_list being the variable name of the list of sentences.
Whenever I wish to run NLP operations on the file, I will just call the "sent_list" variable from mongodb.
(2) If I wish to display the stored pdf in its original content however, I included the following function as a separate route.
storage = GridFS(db, collection)
data = storage.get_last_version(filename)
response = make_response(data.read())
extension = data.filename.split('.')[-1]
response.headers['Content-Type'] = f'application/{extension}'
response.headers['Content-Disposition'] = f'inline; filename={data.filename}'
return response
(2) will open a new tab in my flask app showing the .pdf file in its original format.
I hope this helps anyone coming across a similar problem in the future.

Displaying jpegPhoto attribute from LDAP in Websphere Portal

I have a requirement wherein I need to display details of users after searching from LDAP using PUMA API.
I'm having troubling displaying the jpegPhoto of the user.
Here's what I'm doing:
First I'm querying the user by using:
PumaLocator.findUsersByAttribute(uid, user);
After that we get a User list Object.
For each user, we fetch all the attributes which is in the form of a Map.
I'm getting the following value for while retrieving the jpegPhoto:
map.get("jpegPhoto") --> [B#7a2f8a54
It seems that the Puma API returns a Binary string. Does anyone know how to display this in the portlet?
Any help would be greatly appreciated. Thank you
I think it more likely this is a byte[] array than a string.
You can probably base64 encode this binary into an encoded string and use it in an HTML image tag.
byte[] photoBytes = (byte[]) map.get("jpegPhoto");
String encodedPhoto = org.apache.commons.codec.binary.Base64.encodeBase64(photoBytes);
Then later, perhaps in a JSP (example assumes JSTL variable in scope named encodedPhoto):
<img src="data:image/jpeg;base64,${encodedPhoto}"/>
A way of doing this is to access the image through the portal service servlet instead of using your own servlet: /wps/um/secure/users/profiles/[oid]/jpegPhoto, in which you replace [oid] with the ObjectID of the user. This ID string can be obtained using IdentificationMgr.getIdentification().serialize(user.getObjectID())
The photo of the current user you can access using: /wps/um/secure/currentuser/profile/jpegPhoto
Portal is giving you data as byte array. It will never give you as URL.
You can write a servlet which will write this byte array to output stream.
Use that servlet URL as src of tag. It will start rendering on browser.
FYI, you can't print byte array to browser and expect it to treat as image.
Image or any other files has to come as a resource not as content.

HTML File API: getting the text 'onprogress'

Using the HTML5 File API, I wonder if I can process the content of a file on the fly.
I know I can get the content of the file when onload is called:
function fileLoaded(e)
{
alert("content is "+e.target.result);
}
but can I get the current content when onprogress is called ?
Thanks.
Seems like yes, according to the spec. The onprogress event will fill the result property of your FileReader as more data is read in. However, as Matt pointed out, if you're only interested in a portion of the file in the first place, only read that section:
var blob = file.webkitSlice|mozSlice(startByte, stopByte, contentType);
reader.readAsBinaryString(blob);
I don't think so, but look at this example which shows you how to read slices of files.

What does visibleContentsAsDataURL exactly do?

I am currently trying to build my first Safari extension. The SafariBrowserTab Class has a Method called "visibleContentsAsDataURL".
I don't exactly understand what it does and can't get it to work.
The docs just say: "Returns a data URL for an image of the visible contents of the tab."
What does it mean? That I get the URL of a screenshot of the tabs' content back? Can someone explain me?
Thanks!
I think it returns what is effectively a screenshot of the tab. The format is explained here
http://en.wikipedia.org/wiki/Data_URI_scheme
According to Apple's Safari reference documentation the return value is "a base-64 encoded PNG."
A data URL is a specal type of url basically consisting of a mimetype and data, in the case of png you'll get something along the lines of:
data:image/png;base64;lotsofstuff
You can then do whatever you want with it (it's just a string), or if you want to display the content:
img = new Image();
img.src = someTab.visibleContentsAsDataURL();
someElement.appendChild(img);
or
someCanvasContext.drawImage(img);
etc