Finding a data source for demographics by zip code - datasource

does anyone know specific data sources that would provide demographics such as age or income by zip code? I was able to find specific regions like NYC or other locations but not all in one source. I checked the census data as well but was not successful in finding it there.

I know this is a little late, but there is an API available with a lot of demographic-related data by ZIP code. In case anyone else is looking:
https://rapidapi.com/aptitudeapps/api/zip-code-master/details

Related

Suggestions for file and data tranforms using SQL Query Results to manipulate existing PDF Files

Apologies if something similar to the question I'm asking has already been addressed. I'm not even sure how to best frame my question but I haven't been able to find any posts that are obviously germane. I'm hoping someone has some experience with this and might be willing to offer some suggestions. My company has already contracted to have the bulk of our database converted to HTML for ETL purposes and we simply can't afford to double the already barely-manageable costs of this project by adding this additional requirement to the scope.
We have a SQL database from an EMR software vendor that our company has now left. Due to recent economic factors, we just just can't afford to stay with them any longer. When we left this ex-vendor begrudgingly provided us with a backup copy of our SQL database along with copies of all the scanned images our users have uploaded via their application GUI over the years. I was told they stored the uploads as BLOB data but it turns out not. They weren't actually storing the files in the database at all. Instead, they moved the image to a storage location and wrote the ID, DocType, Filename, DirPath and other document information to the Document table of the DB. It makes sense but leaves us in a bind. Mainly because the filename appears to have been randomly generated at upload. So we now have 50,000 image files with unintelligible filenames stored in a date-based folder structure with no way to correlate any of them with the patients to whom they belong. A couple of examples are as follows:
/root/2020/05102019/69353829-e46b-47e7-ab56-a1762424f0dd.pdf
/root/2014/09282017/385ba21d-e108-4cbb-9287-91110c16edb0.jpg
I compiled a list of attribs so I can make any of them available to the transform. I pulled:
SELECT * FROM document d
JOIN patients p ON d.PatientId = p.pid
JOIN users u ON d.PatientId = u.uid
WHERE u.UserType = '3' AND d.fileformat is NOT NULL AND d.dirpath LIKE 'm%'
ORDER BY u.ulname;
This gave me all patient and document attribs resulting in a list with 197 columns. The challenge is the new EMR vendor can only import these files if all the files for each patient are in a dedicated folder at the patient level so I need the files in a new folder structure. I am trying to do it without abandoning things like PatientID, Scan Date, Description (the customName column), Scanned By, and a possibly a couple other points.
I'll probably end up making the file name something like a concat of customName+docID for identification purposes. Then I'll just need to get the files in something like a /Patient/Docs.extension folder structure.
I went ahead and flattened all the files into a single folder figuring that would make it easier to manipulate. I batched them out like so:
md "D:\OneDrive\Documents\Assets\eClinicalworks\PID\FTP\mobiledoc\Documents\All\"
cd /d "D:\OneDrive\Documents\Assets\eClinicalworks\PID\FTP\mobiledoc\Documents\"
for /r %d in (*) do copy "%d" "D:\OneDrive\Documents\Assets\eClinicalworks\PID\FTP\mobiledoc\Documents\All\"
Now I have them all together.
Screenshot
I still have to figure out how to get them into the new folder structure by patient though.
Just to have it mentioned, I was originally considering using SQL so I could recreate the files and assign the desired attribs as file attribs in one step.
To answer the question asked about the HTML conversion, we have tons of Progress Notes, Doctors Notes, Prescriptions, etc in the database. The only way to get them to the new EMR is to export them to HTML and group them at the patient level so the new vendor can import them.
Honestly, after having to wrestle with all this garbage, I would prefer to avoid this situation in the future by refusing to upload them to the new EMR at all. Instead, just put all these documents on OUR file server and give the new EMR a hyper-link to insert into each patient's patient record that would open all the patient files. The new EMR is browser-based so it could be feasible but I doubt I'll be able to get them to write files to our file server moving forward so doing so would likely just end up making the end-user experience more disjointed.
I don't think your contractors did anything wrong tbh. Taking uploaded files with all their problem characters/duplicated names (got more than one patient called JohnSmith.jpg?) etc and renaming them to a GUID so they can coexist alongside other images without overwriting them is a) sensible and b) what I would do.
I also wouldn't store images inside a database as then the only thing you can do with them is get them out again; something you have to do every time you want to do anything with them. Being able to map an images folder to a url on your web server and then send html using just the file name means that the web server can sever the image without having to pull it out of the db; the db doesn't have to involve itself in pointless IO.
The way to correlate these images with the patients to whom they belong is done by the database. Somewhere else in the db structure will be eg a Patient record with a DocumentId column that links to this document record or a PatientDocuments table that has PatientId/DocumentId pairs.
If there is not, then storing the document bytes in the db won't have helped relate them to the patient, because this relation is not about where the bytes of an image are, it's about what other data was stored to make for a usable system. As it stands your thoughts on the matter, of uploading tens of thousands of images into a db just so you can... er.. get them all out again, would seem to indicate you haven't yet fully grasped the reasons behind why your contractors did what they did.
Because you're under the impression that you can do this, you seem to know how the db relates a document to a patient (if it doesn't then your proposed process will fail) and as such you can arrange for a suitable renaming process without needing to move the image data anywhere. In essence, you're failing to see that a file system storing file data against unique paths is no different to a database table storing file data against unique ids. Your database tables for documents clearly thus links to your file system/file system can be viewed as an extension to the documents table. You need the other tables in the db to make sense of the files, but you need the other tables in a db to make sense of any table in a db. These are key concepts of modelling related data
I don't recommend you undertake the process you propose, but I'm sure that won't dissuade. Consider then (because you didn't really post any details we can work with) this assumed scenario:
Patients
Name,DocumentId
John Smith,1
Jane Doe,2
Documents
Id,FilePath
1,'/root/2020/05102019/69353829-e46b-47e7-ab56-a1762424f0dd.pdf'
2,'/root/2014/09282017/385ba21d-e108-4cbb-9287-91110c16edb0.jpg'
SELECT CONCAT('REN ', d.filePath, ' "', p.Name, RIGHT(d.filePath, 4), '"')
FROM
Patients p
INNER JOIN Documents d ON p.DocumentId = d.DocumentId
The results of the query will essentially be a batch file full of rename commands that renames all the files into a single folder, organized by patient name.
And now all your multiple patients with the same names will overwrite each other and everything will end up in a mess
It also makes my point for me about "don't store files in the db" - look how easy it is to manipulate files when they're in a file system, using existing commands that understand filesystems and files and do things like rename files, or extract exif data, rotate, resize and print... if all those images were in your db the only thing you could do with them, is get them out again; sqlserver cannot rotate, resize, print etc BLOB data but there are thousands of tools out there that understand files and can convert them - those tools cannot understand your db so putting files into a db saddles you with the problem that they become useless until dug out again
Your contractors may not have been so daft as you think; pause a moment before you set about hacking apart all they did, and question whether your driver for doing so is actually correct. If Jane from reception needs to see a picture of John Smith with drivers license XY1234 to ID him, don't provide her with a shared drive full of everyone's pictures, and let her double click, drag and accidentally delete her way around the file system. Provide her with an app that looks in the db, gets the unintelligible but helpfully unique filename off disk and opens it for her to view. And make the file system read only to everyone other than the app, so that users can't break things

Binary Sankey Diagram in Tableau - Not All Activities Match The Corresponding Number of KPIs

How do I link my activities variable to only the corresponding KPIs variable?
Using guidance from a number of sources, but primarily the genius of Jeffery Shafer articulated through the SuperDataScience video, I built a Sankey Diagram for my work. For the most part it works, however, I have been trying to figure out how to adjust my Sankey Diagram model to line up each activity with ONLY the corresponding KPIs, but am having no luck.
The data structure looks like this:
You'll note I changed the binary value to "", 2 instead of 0, 1 as it makes visual calculations easier. For the "Viz" variable, I have "Activity" for the raw data set, then I copy/paste/replicate the data to mirror the data (required for the model) but with "KPI" for the mirrored data.
In the following image, you'll see my main issue is that the smallest represented activity still shows as corresponding to all KPIs when in fact it does not. I want activity to line up only with the corresponding KPIs as some activities don't correspond with all, or even any, KPIs.
Finally, here is the model very similar to what the above video link shows:
Can someone help provide insight into how I can adjust the model to fit activities linking only to corresponding KPIs? I appreciate any insight. Thanks!
I have a solution to the issue, thanks to a helpful Tableau support member named Anthony. It was in the data structure. The data was not structured to only associate "Activities" with their "KPI" values within Tableau's requirements, but every "Activities" value with every "KPI" value. As a result, to achieve the desired result, the data needs to be restructured to only contain a row for every valid "Activities" and "KPI" combination. See the visual below where data is removed to format properly:
-------------------------------------->
Once the table is restructured, the desired visual result should configure with the model. It works like a charm!
Good luck out there!

API for getting medicine detail from UPC bar code

I need to implement an API which can fetch the medicine detail from UPC bar code. All the solutions i got on web are providing ways to read the bar code but I did not found any such API which could get me the product information like medicine Name, Manufacturer, Dosage, Expiry, Batch No etc form the bar code.
Any suggestion would be appreciated.
Assuming you're operating in the US, usually the UPC is actually a shortened version of the NDC, usually with the "extraneous" 0's left out. You need to convert to NDC and then look that data up against a drug database, like the FDA database. Usually however, pharmacies will buy a database from one of the drug database suppliers (e.g. Etreby or Elsevier) because those are curated and have a lot more detail and are usually easier to work with for the sorts of queries a pharmacy might want to make.
Edit: Per my comment below, it looks like you can query the FDA database via UPC without converting to NDC first.

I'm using Google Drive Spreadsheets and need to create a report - how do I do this logical query?

I've got a spreadsheet that has the headings Operation, Priority and Specialty so all of the information for a specific operation is on one line and is stored as text.
I need to create a report for each Specialty that tells me the number of Operations done, which is easy (using COUNTIF), but also how many routine or urgent Priority Operations there were.
This would be easy in a database. I'd do it like this.
COUNT * FROM OPERATION_LIST
WHERE Specialty = "Cardiac"
AND Priority = "Routine";
I cannot for the life of me work out how to do this in Spreadsheet though.
I know Google Drive Spreadsheets have a QUERY function but I cannot work out how to use it for this and Googling for answers is no help. I'm sure it'll be obvious when I see it but I've been working on this for days now, with no luck.
Can anyone help?
I would do it like this:
=query(A:C, "select A,count(B) where B = 'Routine' and C = 'Cardiac' group by A label count(B) 'Count'",1)
Assuming that you have a header row (Operation, Priority, Speciality, which represent A,B,C in the query - respectively).
Hope this helps.

Searching by Zip Code proximity - MySql

I'm having some trouble getting a search by zip code proximity query working. I've search and searched google but everything I find is either way too slow or I can't get working. Here's the issue:
I have a database with a table with all the US Zip Codes (~70,500 of them), and I have a table of several thousand stores (~10,000+), which includes their zip code. I need to be able to provide a zip code and return a list of the closest stores to that zip code, sorted by distance.
Can anyone point me to a good resource for this that they have used and can handle this much load, or share a query they've used that works and is fairly quick about it? It would be MUCH appreciated. Thanks!
You should build a table that has each zip code with an associated latitude and longitude. When someone enters a zip and a distance, you calculate the range of latitudes and longitudes that fall within it, and then select all the zip codes that fall within that bounding box. Then you select any stores that have zip codes within that set, and calculate their distance from the provided zip and sort by it. (Use the haversine formula for calculating the distance between points on a globe)
If speed is your main concern, you might want to precompute all the distances. Have a table that contains a store zip code column, the other zip code, and a distance column. You can restrict the other zip codes to zip codes within a certain distance (say 100 miles, or what have you) if you need to cut down on rows. If you don't restrict the links based on distance, you'll have a table with > 700 million rows, but you could certainly do fast lookups.