Panda Dataframe Css use TTF file - pandas

I am trying to create the HTML from a dataframe. I want to use a custom font from TTF file below code is not working.
import pandas as pd
import dataframe_image as dfi
styles = [
dict(selector="th", props=[("font-family", "Gotham"),
("src", "url('gotham-bold.ttf')")]),
dict(selector="td", props=[("font-family", "Gotham"),
("src", "url('gotham-bold.ttf')")]),
dict(selector="", props=[("font-family", "Gotham"),
("src", "url('gotham-bold.ttf')")])
]
data = [
{
"name": "John",
"gender": "Male"
},
{
"name": "Martin",
"gender": "Female"
}
]
df = pd.json_normalize(data)
df = df.style.set_table_styles(styles).hide(axis='index')
df.to_html("test.html")
Can someone please suggest how to use the font src in Panda?

Related

Error in Loading nested and repeated data to bigquery?

I am getting JSON response from a API.
I want to get 5 columns from that,from which 4 are normal,but 1 column in RECORD REPEATED type.
I want to load that data to Bigquery table.
Below is my code in which Schema is mentioned.
import requests
from requests.auth import HTTPBasicAuth
import json
from google.cloud import bigquery
import pandas
import pandas_gbq
URL='<API>'
auth = HTTPBasicAuth('username', 'password')
# sending get request and saving the response as response object
r = requests.get(url=URL ,auth=auth)
data = r.json()
----------------------json repsonse----------------
{
"data": {
"id": "jfp695q8",
"origin": "taste",
"title": "Christmas pudding martini recipe",
"subtitle": null,
"customTitles": [{
"name": "editorial",
"value": "Christmas pudding martini"
}]
}
}
id=data['data']['id']
origin=data['data']['origin']
title=data['data']['title']
subtitle=data['data']['subtitle']
customTitles=json.dumps(data['data']['customTitles'])
# print(customTitles)
df = pandas.DataFrame(
{
'id':id,
'origin':origin,
'title':title,
'subtitle':'subtitle',
'customTitles':customTitles
},index=[0]
)
# df.head()
client = bigquery.Client(project='ncau-data-newsquery-sit')
table_id = 'sdm_adpoint.testfapi'
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("id", "STRING"),
bigquery.SchemaField("origin", "STRING"),
bigquery.SchemaField("title", "STRING"),
bigquery.SchemaField("subtitle", "STRING"),
bigquery.SchemaField(
"customTitles",
"RECORD",
mode="REPEATED",
fields=[
bigquery.SchemaField("name", "STRING", mode="NULLABLE"),
bigquery.SchemaField("value", "STRING", mode="NULLABLE"),
])
],
autodetect=False
)
df.head()
job = client.load_table_from_dataframe(
df, table_id, job_config=job_config
)
job.result()
customeTitle is RECORD REPEATED fiels, which has two keys name and values, so I have made schema like that.
Below is my table schema.
Below is output of df.head()
jfp695q8 taste Christmas pudding martini recipe subtitle [{"name": "editorial", "value": "Christmas pudding martini"}]
Till here its good.
But ,when I try to load the data to table it throws below error.
ArrowTypeError: Could not convert '[' with type str: was expecting tuple of (key, value) pair
Can anyone tell me whats wrong here?

How to load a jsonl file into BigQuery when the file has mix data fields as columns

During my work flow, after extracting the data from API, the JSON has the following structure:
[
{
"fields":
[
{
"meta": {
"app_type": "ios"
},
"name": "app_id",
"value": 100
},
{
"meta": {},
"name": "country",
"value": "AE"
},
{
"meta": {
"name": "Top"
},
"name": "position",
"value": 1
}
],
"metrics": {
"click": 1,
"price": 1,
"count": 1
}
}
]
Then it is store as .jsonl and put on GCS. However, when I load it onto BigQuery for further extraction, the automatic schema inference return the following error:
Error while reading data, error message: JSON parsing error in row starting at position 0: Could not convert value to string. Field: value; Value: 100
I want to convert it in to the following structure:
app_type
app_id
country
position
click
price
count
ios
100
AE
Top
1
1
1
Is there a way to define manual schema on BigQuery to achieve this result? Or do I have to preprocess the jsonl file before put it to BigQuery?
One of the limitations in loading JSON data from GCS to BigQuery is that it does not support maps or dictionaries in JSON.
A invalid example would be:
"metrics": {
"click": 1,
"price": 1,
"count": 1
}
Your jsonl file should be something like this:
{"app_type":"ios","app_id":"100","country":"AE","position":"Top","click":"1","price":"1","count":"1"}
I already tested it and it works fine.
So wherever you process the conversion of the json files to jsonl files and storage to GCS, you will have to do some preprocessing.
Probably you have to options:
precreate target table with an app_id field as an INTEGER
preprocess jsonfile and enclose 100 into quotes like "100"

pymongo: Document must be an instance of dict

Afternoon,
I'm facing a problem with pymongo, I'm not able to set up correctly the parameter to insert into MongoDB via insert_many(). I came across with de following error:
TypeError: document must be an instance of dict, bson.son.SON,
bson.raw_bson.RawBSONDocument, or a type that inherits from
collections.MutableMapping [while running 'Insere no MongoDB']
What am I doing wrong ?
class InsertMongoDB(beam.DoFn):
def process(self, element):
arqJson=json.loads(element)
client = MongoClient("mongodb://user:password#mkp-cr-marketplace-core.lcr88.gcp.mongodb.net/db-poc-base360?retryWrites=true&w=majority%20")
db = client['db-poc-base360']
db.tbPropostaSucesso.insert_many(arqJson)
# tbPropostaErro = db['tbPropostaErro']
# tbPropostaErro
resultado = 0
yield resultado
I receive a messagem from Google PubSub and forward it to a method called InsertMongoDB().
I don't know how to suit my massage, whose value is in a json format, to use it correctly in the insert_many().
When I debug my variable "arqJson" has :
The json that I'm using is :
{
"Status": "Sucesso ",
"Documento": {
"Apolice": [{
"ItemAuto": [{
"nmTipo": "FOX",
"nrItem": "000001",
"nmMarca": "VOLKSWAGEN",
"aaModelo": "2017",
"cdModelo": "0017664",
"nmModelo": "TRENDLINE 1.0 FLEX 12V 5P",
"aaFabricacao": "2016",
"nmTipoVeiculo": "Hatch"
}, {
"nmTipo": "FOX",
"nrItem": "000001",
"nmMarca": "VOLKSWAGEN",
"aaModelo": "2017",
"cdModelo": "0017664",
"nmModelo": "TRENDLINE 1.0 FLEX 12V 5P",
"aaFabricacao": "2016",
"nmTipoVeiculo": "Hatch"
}],
"ItemProp": [{
"dsUF": "MG",
"idLocal": "000001",
"dsCidade": "BELO HORIZONTE",
"dsEndereco": "RUA RUA RUA",
"dsComplemento": "CASA"
}],
"cdEmpresa": "1",
"idApolice": "501741",
"idEndosso": "000000",
"cdCarteira": "431",
"cdSucursal": "010",
"cdPatrimonio": "1",
"nrItemContrato": "2",
"dsTipoDocumento": "A",
"cdVeiculoSegurado": "1"
}],
"Cliente": [{
"cdCliente": "1",
"nmCliente": "Lucas",
"nrCpfCnpj": "4355582833",
"icRegistroAtivo": "1",
"cdAcaoInformacao": "A",
"dtAcaoInformacao": "2020-02-02",
"cdServicoAcaoInformacao": "cdServicoAcao",
"cdUsuarioAcaoInformacao": "cdUsuarioAcao"
}, {
"cdCliente": "2",
"nmCliente": "Lucas",
"nrCpfCnpj": "43331971",
"icRegistroAtivo": "1",
"cdAcaoInformacao": "A",
"dtAcaoInformacao": "2020-02-01",
"cdServicoAcaoInformacao": "cdServicoAcao2",
"cdUsuarioAcaoInformacao": "cdUsuarioAcao2"
}],
"Mensagem": [{
"cdMensagem": "1",
"dsMensagem": "Teste de mensagem"
}],
"EnderecoCobranca": [{
"dsUF": "RS",
"dsBairro": "INTEGRAÇÃO",
"dsCidade": "PAROBE",
"cdEndereco": 1,
"dsEndereco": "RUA RUA RUA",
"nrEndereco": "280",
"dsComplemento": "",
"icRegistroAtivo": "1",
"cdAcaoInformacao": "A",
"dtAcaoInformacao": "2020-02-02",
"cdServicoAcaoInformacao": "cdServicoAcao",
"cdUsuarioAcaoInformacao": "cdUsuarioAcao"
}, {
"dsUF": "SP",
"dsBairro": "INTEGRAÇÃO2",
"dsCidade": "POC2",
"cdEndereco": 2,
"dsEndereco": "RUA B",
"nrEndereco": "222",
"dsComplemento": "CASA 2",
"icRegistroAtivo": "1",
"cdAcaoInformacao": "A",
"dtAcaoInformacao": "2020-02-01",
"cdServicoAcaoInformacao": "cdServicoAcao2",
"cdUsuarioAcaoInformacao": "cdUsuarioAcao2"
}]
}
}
2020/11/20:
At the moment I'm struggling with the format of arqJson that I need to use in the insert_one(arqJson).
I forgot to mention that my method InsertMongoDB receives the arqJson from another method called InsertPostgreSQL.
InsertPostgreSQL does:
Receive the message from Pubsub;
-Transform the element : json.dumps(json.loads(element))
Save it into arqJson. After that, InsertMongoDB is called.
At this moment, I don't know how to format "element" (whose type is list) and save it into arqJson, because I have this error:
raise TypeError("%s must be an instance of dict, bson.son.SON, "
TypeError: document must be an instance of dict, bson.son.SON,
bson.raw_bson.RawBSONDocument, or a type that inherits from
collections.MutableMapping [while running 'Insere no MongoDB']
Thank you,
Juliano
The solution is :
The first error is because your JSON contains a single document not
multiple docs for an insert many. If you use brackets like this
db.tbPropostaSucesso.insert_many([arqJson]) and convert it to a list
with a single element it will work. Or you can try
insert_one(arqJson). – DaveStSomeWhere 5 hours ago
Thank you DaveStSomeWhere
I had the same problem, and what worked for me was adding retryWrites=false to the connection URL:
mongodb+srv://user:pass#server/etc...etc?retryWrites=false

How to export pandas data to elasticsearch?

It is possible to export a pandas dataframe data to elasticsearch using elasticsearch-py. For example, here is some code:
https://www.analyticsvidhya.com/blog/2017/05/beginners-guide-to-data-exploration-using-elastic-search-and-kibana/
There are a lot of similar methods like to_excel, to_csv, to_sql.
Is there a to_elastic method? If no, where should I request it?
The following script works for localhost:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
INDEX="dataframe"
TYPE= "record"
def rec_to_actions(df):
import json
for record in df.to_dict(orient="records"):
yield ('{ "index" : { "_index" : "%s", "_type" : "%s" }}'% (INDEX, TYPE))
yield (json.dumps(record, default=int))
from elasticsearch import Elasticsearch
e = Elasticsearch() # no args, connect to localhost:9200
if not e.indices.exists(INDEX):
raise RuntimeError('index does not exists, use `curl -X PUT "localhost:9200/%s"` and try again'%INDEX)
r = e.bulk(rec_to_actions(df)) # return a dict
print(not r["errors"])
Verify using curl -g 'http://localhost:9200/dataframe/_search?q=A:[29%20TO%2039]'
There are many little things that can be added to suit different needs but main is there.
I'm not aware of any to_elastic method integrated in pandas. You can always raise an issue on the pandas github repo or create a pull request.
However, there is espandas which allows to import a pandas DataFrame to elasticsearch. The following example from the README has been tested with Elasticsearch 6.2.1.
import pandas as pd
import numpy as np
from espandas import Espandas
df = (100 * pd.DataFrame(np.round(np.random.rand(100, 5), 2))).astype(int)
df.columns = ['A', 'B', 'C', 'D', 'E']
df['indexId'] = (df.index + 100).astype(str)
INDEX = 'foo_index'
TYPE = 'bar_type'
esp = Espandas()
esp.es_write(df, INDEX, TYPE)
Retrieving the mappings with GET foo_index/_mappings:
{
"foo_index": {
"mappings": {
"bar_type": {
"properties": {
"A": {
"type": "long"
},
"B": {
"type": "long"
},
"C": {
"type": "long"
},
"D": {
"type": "long"
},
"E": {
"type": "long"
},
"indexId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
may you can use
pip install es_pandas
pip install progressbar2
This package should work on Python3(>=3.4) and ElasticSearch should be version 5.x, 6.x or 7.x.
import time
import pandas as pd
from es_pandas import es_pandas
# Information of es cluseter
es_host = 'localhost:9200'
index = 'demo'
# crete es_pandas instance
ep = es_pandas(es_host)
# Example data frame
df = pd.DataFrame({'Alpha': [chr(i) for i in range(97, 128)],
'Num': [x for x in range(31)],
'Date': pd.date_range(start='2019/01/01', end='2019/01/31')})
# init template if you want
doc_type = 'demo'
ep.init_es_tmpl(df, doc_type)
# Example of write data to es, use the template you create
ep.to_es(df, index, doc_type=doc_type)
# set use_index=True if you want to use DataFrame index as records' _id
ep.to_es(df, index, doc_type=doc_type, use_index=True)
here is the document https://pypi.org/project/es-pandas/
if 'es_pandas' cann't solve you problem,you could see other solution : https://towardsdatascience.com/exporting-pandas-data-to-elasticsearch-724aa4dd8f62
You could use elasticsearch-py or if you won't use elasticsearch-py you may find answer to your question here => index-a-pandas-dataframe-into-elasticsearch-without-elasticsearch-py

Nested pymongo queries (mlab)

I have some documents in mlab mongodb; the format is:
{
"_id": {
"$oid": "58aeb1d074fece33edf2b356"
},
"sensordata": {
"operation": "chgstatus",
"user": {
"status": "0",
"uniqueid": "191b117fcf5c"
}
},
"created_date": {
"$date": "2017-02-23T15:26:29.840Z"
}
}
database name : mparking_sensor
collection name : sensor
I want to query in python to extract status key value pair and created_date key value pair only.
my python code is :
import sys
import pymongo
uri = 'mongodb://thorburn:tekush1!#ds157529.mlab.com:57529/mparking_sensor'
client = pymongo.MongoClient(uri)
db = client.get_default_database().sensor
print db
results = db.find()
for record in results:
print(record["sensordata"] , record['created_date'])
print()
client.close()
which gives me everything under sensordata as expected, dot notations giving me an error, can somebody help?
PyMongo represents BSON documents as Python dictionaries, and subdocuments as dictionaries within dictionaries. To access a value in a nested dictionary:
record["sensordata"]["user"]["status"]
So a complete print statement might be:
print("%s %s" % (record["sensordata"]["user"]["status"], record['created_date']))
That prints:
0 {'$date': '2017-02-23T15:26:29.840Z'}