Dealing with special characters from data schema in GraphQL - api

I am dealing with data which is structured (by a third party) with special characters; like so:
"pageFansGenderAge": {
"current": {
"U.13-17": 1,
"U.55-64": 246,
"M.55-64": 11925,
"U.35-44": 370,
"F.45-54": 16443,
"M.18-24": 8996,
"M.35-44": 20641,
"F.25-34": 11687,
"U.65+": 148,
"U.18-24": 42,
"M.25-34": 22341,
"F.13-17": 177,
"U.45-54": 415,
"F.65+": 5916,
"F.55-64": 12172,
"M.13-17": 141,
"M.65+": 6576,
"F.35-44": 14491,
"U.25-34": 178,
"M.45-54": 17979,
"F.18-24": 5787
},
GraphQL is throwing errors as it can't accept special characters , the full-stop and the hyphen are causing issues. Is there a known way to parse these in to stop the errors? Simply removing all the special characters (obvs) just returns null values.
Thanks in advance.

I have found a work around.
I can return the current data as JSON. Thanks to this stack-overflow answer:
💡 Answer: Use 'scalar JSON' in your GraphQL query
GraphQL - Get all fields from nested JSON object
🤘

Related

Different table_areas on multiple page pdf

I would like to extract tables from a multiple page pdf. Because of the table properties, I need to use the flavor='stream' and table_areas properties to read_pdf for my table to be properly detected. My problem is that the position of the table is different on each page (the first page has an address head and not the other)
I have tried to provide several areas to the read_pdf function such as follows:
camelot.read_pdf(file, pages='all', flavor='stream', table_areas=['60, 740, 580, 50','60, 470, 580, 50'])
but this result as having 2 tables per page. How can I specify the table_areas for each page separately?
I have also tried to run several times read_pdf with different pages/table_areas, how ever then I cannot append the several result together to have a single objet:
tables = camelot.read_pdf(file, pages='1', flavor='stream', table_areas=['60, 470, 580, 50'])
tables.append(camelot.read_pdf(file, pages='2-end', flavor='stream', table_areas=['60, 740, 580, 50']))
gives an error as append is not a method of resulting tables
Is there a way to concatenate the results of several call of the read_pdf function?
Actually, as you noticed, you can't add items directly to the TableList object.
Instead, you can manipulate TableList _tables property (_tables is a list), in the following way:
my_tables = camelot.read_pdf(file, pages='1', flavor='stream', table_areas=['60, 470, 580, 50'])
my_tables._tables.append(camelot.read_pdf(file, pages='2-end', flavor='stream', table_areas=['60, 740, 580, 50']))
Now my_tables should consist of two tables.

Error code: DelimitedTextMoreColumnsThanDefined Azure Data Factory

I am trying to copy data from a csv file to a sql table in Azure Data Factory
This is my type property for the CSV file
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"fileName": "2020-09-16-stations.csv",
"container": "container"
},
"columnDelimiter": ",",
"escapeChar": "\\",
"firstRowAsHeader": true,
"quoteChar": "\""
I recieve following error:
ErrorCode=DelimitedTextMoreColumnsThanDefined,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error found when processing 'Csv/Tsv Format Text' source '2020-09-16-stations.csv' with row number 2: found more columns than expected column count 11.,Source=Microsoft.DataTransfer.Common,'
This is row #2
0e18d0d3-ed38-4e7f,Station2,Mainstreet33,,12207,Berlin,48.1807,11.4609,1970-01-01 01:00:00+01,"{""openingTimes"":[{""applicable_days"":96,""periods"":[{""startp"":""08:00"",""endp"":""20:00""}]},{""applicable_days"":31,""periods"":[{""startp"":""06:00"",""endp"":""20:00""}]}]}"
I think the last column, the JSON query is making trouble in this case. When I view the data it looks fine:
I thought exactly the "quoteChar": "\""would prevent that the last column makes problems. I have no idea why I am getting this error while i run debug
Try setting the escape character = " (a double quote). This should treat each pair of double quotes as an actual single quote and wont consider them as a "Quote Char" within the string, so you will end up with a string that looks like this (and which the system knows is a single string and not something it has to split):
{"openingTimes":[{"applicable_days":96,"periods":[{"startp":"08:00","endp":"20:00"}]},
{"applicable_days":31,"periods":[{"startp":"06:00","endp":"20:00"}]}]}
This is because this value "{""openingTimes"":[{""applicable_days"":96,""periods"":[{""startp"":""08:00"",""endp"":""20:00""}]},{""applicable_days"":31,""periods"":[{""startp"":""06:00"",""endp"":""20:00""}]}]}" contains several comma and your columnDelimiter is "," which leads to that value is split to several column. So you need to change your columnDelimiter.

Snowflake Searching string in semi structured data

I have a table. There are many columns and rows. One column that I am trying to query in Snowflake has semi structured data. For example, when I query
select response
from table
limit 5
This is what is returned
[body={\n "id": "xxxxx",\n "object": "charge",\n "amount": 500,\n "amount_refunded": 0,\n "application": null,\n "application_fee": null,\n "application_fee_amount": null,\n "balance_transaction": null,\n "billing_details": {\n "address": {\n "city": null,\n "zip": "xxxxx",]
I want to select only the zip in this data. When I run code:
select response:zip
from table
limit 5
I get an error.
SQL compilation error: error line 1 at position 21 Invalid argument types for function 'GET': (VARCHAR(16777216), VARCHAR(11))
Is there a reason why this is happening? I am new to snowflake so trying to parse out this data but stuck. Thanks!
Snowflake has very good documentation on the subject
For your specific case, have you attempted to use dot notation? It's the appropiate method for accessing JSON. So
Select result:body.zip
from table
Remember that you have your 'body' element. You need to access that one first with semicolon because it's a level 1 element. Zip is located within body so it's a level 2. Level 1 elements are accessed with semicolon, level 2 elements are accessed with dot notation.
I think you have multiple issues with this.
First I think your response column is not a variant column. Please run the below query and confirm
SHOW COLUMNS ON table;
Even if the column is variant, the way the data is stored is not in a valid JSON format. You will need to strip the JSON part and then store that in the variant column.
Please do the first part and share the information, I will then suggest next steps. I wanted to put that in the comment but comment does not allow to write so many sentences.

Django query to Postgres returning wrong column

I'm facing a strange problem maybe related with some cache that I cannot find.
I have the following Models:
class Incubadores(models.Model):
incubador = models.CharField(max_length=10, primary_key=True)
posicion = models.CharField(max_length=10)
class Tareas(TimeStampedModel):
priority = models.CharField(max_length=20, choices=PRIORITIES, default='normal')
incubador = models.ForeignKey(Incubadores, on_delete=models.CASCADE, null=True, db_column='incubador')
info = JSONField(null=True)
datos = JSONField(null=True)
class Meta:
ordering = ('priority','modified','created')
I previously didn't have the argument db_column, so the Postgres column for that field was incubador_id
I used the argument db_column to change the name of the column, and then I run python manage.py makemgrations and python manage.py migrate, but I'm still getting the column as incubadores_id whenever I perform a query such as:
>>> tareas = Tareas.objects.all().values()
>>> print(tareas)
<QuerySet [{'info': None, 'modified': datetime.datetime(2019, 11, 1, 15, 24, 58, 743803, tzinfo=<UTC>), 'created': datetime.datetime(2019, 11, 1, 15, 24, 58, 743803, tzinfo=<UTC>), 'datos': None, 'priority': 'normal', 'incubador_id': 'I1.1', 'id': 24}, {'info': None, 'modified': datetime.datetime(2019, 11, 1, 15, 25, 25, 49950, tzinfo=<UTC>), 'created': datetime.datetime(2019, 11, 1, 15, 25, 25, 49950, tzinfo=<UTC>), 'datos': None, 'priority': 'normal', 'incubador_id': 'I1.1', 'id': 25}]>
I need to modify this column name because I'm having other issues with Serializers. So the change is necessary.
If I perform the same query in other Models where I've also changed the name of the default field. The problem is exactly the same.
It happens both on the shell and on the code.
I've tried with different queries, to make sure it's not related to Django lazy query system, but the problem is the same. I've also tried executing django.db.connection.close().
If I do a direct SQL query to PostgreSQL, it cannot find incubador_id, but only incubador, which is correct.
Anyone has any idea of what can be happening? I've already been 2 days with this problem and I cannot find a reason :( It's a very basic operation.
Thanks!
This answer will explain why this is happening.
Django's built-in serializers don't have this issue, but probably won't yield exactly what you're looking for:
>>> from django.core import serializers
>>> serializers.serialize("json", Tareas.objects.all())
'[{"model": "inc.tareas", "pk": 1, "fields": {"priority": "normal", "incubador": "test-i"}}]'
You could use the fields attribute here, which seems like it would give you what you're looking for.
You don't specify what your "other issues with Serializers" are, but my suggestion would be to write custom serialization code. Relying on something like .values() or even serializers.serialize() is a bit too implicit for me; writing explicit serialization code makes it less likely you'll accidentally break a contract with a consumer of your serialized data if this model changes.
Note: Please try to make the example you provide minimal and reproducible. I removed some fields to make this work with stock Django, which is why the serialized value is missing fields; the _id issue was still present without the third-party apps you're using, and was resolved with serializers. This also isn't specific to PG; it happens in sqlite as well.

PostgreSQL: Create Index in JSON Array

I am pretty new to postregSQL and not too familiar with SQL yet. But im trying to learn.
In my database i want to store huge JSON files (~2mio lines, 40mb) and later query them as fast as possible. Right now it is to slow, so i figured indexing should do the trick.
The Problem is i do not know how to index the file since it is a bit tricky. I am woking on it the whole day now and starting to get desperate..
My DB is calles "replays" the json column "replay_files"
So my files look like this:
"replay": [
{
"data": {
"posX": 182,
"posY": 176,
"hero_name": "CDOTA_Unit_Hero_EarthSpirit"
},
"tick": 2252,
"type": "entity"
},
{
"data": {
"posX": 123,
"posY": 186,
"hero_name": "CDOTA_Unit_Hero_Puck"
},
"tick": 2252,
"type": "entity"
}, ...alot more lines... ]}
I tried to get all the entries with say heron_name: Puck
So i tried this:
SELECT * FROM replays r, json_array_elements(r.replay_file#>'{replay}') obj WHERE obj->'data'->>'hero_name' = 'CDOTA_Unit_Hero_Puck';
Which is working but for smaller files.
So i want to index like that:
CREATE INDEX hero_name_index ON
replays ((json_array_elements(r.replay_file#>'{replay}')->'data'->'hero_name);
BUt it doesn work. I have no idea how to reach that deep into the file and get to index this stuff.
I hope you understand my problem since my english isnt the best and can help me out here. I just dont know what else to try out.
Kind regards and thanks alot in advance
Peter