How to get all the data from a DICOM file with Imebra - imebra

I am working on a project that integrates Imebra inside an android application. The application is supposed to extract all the data from a given DICOM file and put them into a .xml file. I need a little bit of help with it. For example, I don't know how to get all the VR tags that the given DICOM has, instead of getting them one by one using tag ids.
Thank you for your help.

Load the file using CodecFactory.load(filename).
Then you can use DataSet.getTags() to retrieve a list of tags stored into the DICOM structure.
The returned class TagsIds is a list containing all the TagId: scan each tag ID and retrieve it via DataSet.getString() (to retrieve the value as string) and DataSet.getDataType() to retrieve its VR.
When DataSet.getString() fails then you are dealing with a sequence (an embedded DICOM structure) which can be retrieved with DataSet.getSequenceItem().
You can use the static method DicomDictionary.getTagName() to get a description of a particular tag.

Related

GridFs read PDF

I am trying to build a financial dashboard with Flask and pymongo. The starting point is a flask form which saves data in a MongoDB database. One of the fields in the form is a FileField (wtforms) which allows the upload of a PDF, which is then stored in MongoDB with GridFS.
Now I manage to save the pdf and I can see the resulting entries within the .files and .chunks collections. Now I would like to build a function that retrieves the PDFs and analyses them with some basic NLP, however I struggle with the getting meaningful data.
When I do:
storage = gridfs.GridFS(db, collection)
data = storage.get('some id')
a = data.read()
The result is a binary file. If I continue with:
with open(data, 'rb') as f:
b = f.read()
The result is "ValueError: embedded null byte or sometimes an empty "byte string".
Any help on this?
To follow up on the above, I found a solution for myself that consists in 2 separate functions:
(1) Upon upload of the form and before uploading the files to MongoDB, I apply a function based on pdfminer that extracts the string content of the PDF and tranform it into a list of sentences using NLTK. I will then store this list in the .files via the storage.put(file, sent_list = sent_list) #sent_list being the variable name of the list of sentences.
Whenever I wish to run NLP operations on the file, I will just call the "sent_list" variable from mongodb.
(2) If I wish to display the stored pdf in its original content however, I included the following function as a separate route.
storage = GridFS(db, collection)
data = storage.get_last_version(filename)
response = make_response(data.read())
extension = data.filename.split('.')[-1]
response.headers['Content-Type'] = f'application/{extension}'
response.headers['Content-Disposition'] = f'inline; filename={data.filename}'
return response
(2) will open a new tab in my flask app showing the .pdf file in its original format.
I hope this helps anyone coming across a similar problem in the future.

Is there a way to list the directories in a using PySpark in a notebook?

I'm trying to see every file is a certain directory, but since each file in the directory is very large, I can't use sc.wholeTextfile or sc.textfile. I wanted to just get the filenames from them, and then pull the file if needed in a different cell. I can access the files just fine using Cyberduck and it shows the names on there.
Ex: I have the link for one set of data at "name:///mainfolder/date/sectionsofdate/indiviual_files.gz", and it works, But I want to see the names of the files in "/mainfolder/date" and in "/mainfolder/date/sectionsofdate" without having to load them all in via sc.textFile or sc.Wholetextfile. Both those functions work, so I know my keys are correct, but it takes too long for them to be loaded.
Considering that the list of files can be retrieve by one single node, you can just list the files in the directory. Look at this response.
wholeTextFiles returns a tuple (path, content) but I don't know if the file content is lazy to get only the first part of the tuple.

Displaying jpegPhoto attribute from LDAP in Websphere Portal

I have a requirement wherein I need to display details of users after searching from LDAP using PUMA API.
I'm having troubling displaying the jpegPhoto of the user.
Here's what I'm doing:
First I'm querying the user by using:
PumaLocator.findUsersByAttribute(uid, user);
After that we get a User list Object.
For each user, we fetch all the attributes which is in the form of a Map.
I'm getting the following value for while retrieving the jpegPhoto:
map.get("jpegPhoto") --> [B#7a2f8a54
It seems that the Puma API returns a Binary string. Does anyone know how to display this in the portlet?
Any help would be greatly appreciated. Thank you
I think it more likely this is a byte[] array than a string.
You can probably base64 encode this binary into an encoded string and use it in an HTML image tag.
byte[] photoBytes = (byte[]) map.get("jpegPhoto");
String encodedPhoto = org.apache.commons.codec.binary.Base64.encodeBase64(photoBytes);
Then later, perhaps in a JSP (example assumes JSTL variable in scope named encodedPhoto):
<img src="data:image/jpeg;base64,${encodedPhoto}"/>
A way of doing this is to access the image through the portal service servlet instead of using your own servlet: /wps/um/secure/users/profiles/[oid]/jpegPhoto, in which you replace [oid] with the ObjectID of the user. This ID string can be obtained using IdentificationMgr.getIdentification().serialize(user.getObjectID())
The photo of the current user you can access using: /wps/um/secure/currentuser/profile/jpegPhoto
Portal is giving you data as byte array. It will never give you as URL.
You can write a servlet which will write this byte array to output stream.
Use that servlet URL as src of tag. It will start rendering on browser.
FYI, you can't print byte array to browser and expect it to treat as image.
Image or any other files has to come as a resource not as content.

How to retrieve the data(String) from text file and send into particular field in application using java (selenium webdriver.)

I have tried to get the data from text file it worked , but unable to send the same data to particular field in application.
Store that text in string variable and send it using sendKeys(your_string) method.

Plone 4 - Get url of a file in a plone.app.blob.field.FileField

I have a custom content type with 3 FileFields (plone.app.blob.field.FileField) and I want to get their url's, so i can put them on my custom view and people will be able to download these files.
However, when using Clouseau to test and debug, I call :
context.getFirst_file().absolute_url()
Where getFirst_file() is the accessor to the first file (field called 'first_file').
The url returned is 'http://foo/.../eat.00001', where 'eat.00001' is the object of my custom type that contains the file fields...
The interesting thing is, if I call:
context.getFirst_file().getContentType()
It returns 'application/pdf', which is correct since it's a pdf file.
I'm pretty lost here, any help is appreciated. Thanks in advance!
File fields do not support a absolute_url method; instead, through acquisition you inherit the method from the object itself, hence the results you see. Moreover, calling getFirst_field() will return the actual downloadable contents of the field, not the field itself which could provide such information.
Instead, you should use the at_download script appended to the object URL, followed by the field id:
First File
You can also re-use the Archetypes widget for the field, by passing the field name to the widget method:
<metal:field use-macro="python:context.widget('first_field', mode='view')">
First File
</metal:field>
This will display the file size, icon (if available), the filename and the file mime type.
In both these examples, I assumed the name of the field is 'first_field'.