I'm trying to create a Twitter bot that gives daily news about random subjects.
I want to import those daily news from Mediastack and it works pretty well, but I receive something like that (I'm French, that's why the article is in French and there is those strange characters):
Article(author=None, title='La pand�mie motive les voleurs: Une nette hausse
des escroqueries financi�res constat�e', description='L�instance de m�diation
des banques suisses a trait� 2175 nouveaux cas en 2020, ce qui repr�sente une
hausse de 8% par rapport � l�ann�e pr�c�dente.', url='https://www.tdg.ch/une-
nette-hausse-des-escroqueries-financieres-constatee-333483840132',
image='https://cdn.unitycms.io/image/ocroped/400,400,1000,1000,0,0/s7bDqn27jGQ/FU0
Tdt5Oqhp9VzkeyvqdPJ.jpg', category='general', language='fr', country='ch',
published_at='2021-07-01T12:17:02+00:00', source='Tribune de Geneve')
I would like to put all this informations into a dict so I can use what I need. I would also like to change the strange characters into the letters with accents (like "é") they should be.
I changed the API I was using and I directly get a dict to the request.
Thanks!
Related
Context
I am trying to get the official names (sometimes called "formal" or "full" names) for every country in as many languages as possible. I'm essentially looking for the exonymic versions of the government name.
The United Nations provides such data in all of its six working languages (plus Portuguese!?), as does the EU in all of its member nations' languages.
I was hoping to augment these lists with data from WikiData.
A working example:
Query
SELECT ?official_name (lang(?official_name) AS ?lang)
WHERE {
# Q30 = United States
wd:Q30 wdt:P1448 ?official_name .
}
Output
official_name
lang
United States
en
Vereinigte Staaten von Amerika
de
the United States of America
en
Unuiĝintaj Ŝtatoj de Ameriko
eo
Estados Unidos de América
es
États-Unis d’Amérique
fr
Stati Uniti d'America
it
Verenigde Staten van Amerika
nl
Statele Unite ale Americii
ro
Сједињене Америчке Државе
sr
Amerika Birleşik Devletleri
tr
However, some countries, despite having "official name" entries in multiple languages, only return one result. These include (non-exhaustive list):
Q183: Germany (in 'de')
Q148: China (in 'zh-hans')
At first, I thought it might be that the query returned ALL official names if the country does not have an official language (English is the de facto official language of the United States, but not de jure). However, Finland (Q33) has two official languages, yet returns nine entries as of 2022-05-19 (including French, which cannot possibly be an official minority language in Finland)
Question
Am I doing something wrong? Is there another way I could form this query?
There was some discussion about the flaws of this property, albeit nothing fruitful: https://www.wikidata.org/wiki/Property_talk:P1448
I'm learning to use the Amadeus API...
I'm able to search flights using "flight-offers-search", but as the title states, if I restrict results to American Airlines (AA), it returns nothing.
There absolutely are AA flights from DFW on the specified day (I'm on one), so not sure why it would fail.
So far I am unable to return ANY flights on ANY day, if "includedAirlineCodes=AA" is specified.
What is special about American Airlines? What am I missing?
url <- "https://test.api.amadeus.com/v2/shopping/flight-offers?originLocationCode=DFW&destinationLocationCode=SAN&departureDate=2021-09-03&travelClass=ECONOMY&adults=1&max=5¤cyCode=USD&includedAirlineCodes=AA"
Content from American Airlines is not included in the Self-Service APIs as described in the API overview.
I have a dataset that returns a BLOB field (thats how BIRT has binded in the table). In the database the data type is classified as Long Raw, so i need to transform the binary data to text using a generic convert function.
The problem is that BIRT appears to not recognize embedded RTF expressions after the conversion, but maybe im doing something wrong.
I was using a Dynamic Text component that contains the data converted in the Expression Builder property. Also, the content type of that field is set to RTF.
Here is how BIRT shows
{\rtf1\ansi
\ansicpg1252\deff0{\fonttbl{\f0\fnil MS
Sans Serif;}{\f1\fnil\fcharset0 MS Sans
Serif;}}
\viewkind4\uc1\pard\qc\lang1046\b
\f0\fs16 1 x\f1\'ed\-cara de leite
\par 1 colher de sopa de fermendo em p
\'f3
\par 3 x\'ed\-caras de farinha de trigo
\par 3 x\'ed\-caras de a\'e7\'facar
\par 3 ovos
\par 4 colheres de margarina\b0\f0
\par }
As we can see, the text contains RTF tags mixed with the main content.
The idea is to make birt delete the tags or be able to model them in some way.
Here is how i was expecting the output
1 xícara de leite
1 colher de sopa de fermento
3 xícaras de farinha de trigo
After some research there is a possible answer, but is not the perfect one because the goal was to model in some way the RTF tags. Here it is:
The firs step os to convert the binary data
function convert( byteArr ) {
const convertedbyteArr = "";
for(var i = 0; i<byteArr.length;i++){
teste += String.fromCharCode(byteArr[i]);
}
return convertedbyteArr ;
}
The next step is to delete all RTF tags using regex. This solution was based on this post: Regular Expression for extracting text from an RTF string .
function removeRTF (str) {
var basicRtfPattern = /\{\*?\\[^{}]+;}|[{}]|\\[A-Za-z]+\n?(?:-?\d+)?[ ]?/g;
var newLineSlashesPattern = /\\\n/g;
var ctrlCharPattern = /\n\\f[0-9]\s/g;
return str
.replace(ctrlCharPattern, "")
.replace(basicRtfPattern, "")
.replace(newLineSlashesPattern, "\n")
.replace(/\\'c9/g,"É")
.replace(/\\'cd/g,"Í")
.replace(/\\'ed\\-/g,"í")
.replace(/\\'f3/g,"ó")
.replace(/\\'d3/g,"Ó")
.replace(/\\'fa/g,"ú")
.replace(/\\'fa/g,"ú")
.replace(/\\'da/g,"Ú")
.replace(/\\'e7/g,"ç")
.replace(/\\'e1/g,"á")
.replace(/\\'e1/g,"á")
.replace(/\\'e0/g,"à")
.replace(/\\'c0/g,"À")
.replace(/\\'c1/g,"Á")
.trim();
}
It is important to note that the accents are treated individually.
The old ROM specification of BIRT shows that once upon a time there were plans to support RTF formatted text, but it was never implemented (and never will be implemented).
The de-facto standard for formatted text coded in a text file is now HTML.
I need to show a FileName (NomFichier in french) if only there is a file attached to the report.
enter image description here
Here you can see a part of the XML, where PiecesJointes = attached files and inside it has multiples type of files. The one that interest us is ResolutionCA.
<PiecesJointes>
<AttestationRQ>
<InfoFichierJoint>
<NomFichier>readme.txt</NomFichier>
</InfoFichierJoint>
</AttestationRQ>
<ResolutionCA>
<InfoFichierJoint>
<NomFichier>test.txt</NomFichier>
</InfoFichierJoint>
</ResolutionCA>
<FormulaireCautionnement>
<InfoFichierJoint>
<NomFichier>NW2W014_20210504075509_readme.txt</NomFichier>
</InfoFichierJoint>
</FormulaireCautionnement>
</PiecesJointes>
My question:
Let's say I want to check if NomFichier (NameFile) inside the InfoFichierJoint tag that is inside ResolutionCA tag, has a name file written in the tag or exist. What do I have to do?
Here you can see what I tried but didn't have success with. Which let me to think that the inheritance of the tags were the problems.
=IIF(IsNothing(Fields!NomFichier.Value)= "true" OR Fields!NomFichier.Value ="", "Les conditions de votre demande ne requièrent aucun document. ",
"Résolution de la personne morale, société ou autre entité qui autorise le répondant à présenter la demande de permis")
I am quite new to R and Quanteda. I'm trying to create a dataframe with people who voted in favour and against a legislative proposal based on parliamentary transcripts. I can't figure out how to do this and some help would be greatly appreciated.
The following is an example of what the text could look like:
In favour: van Vliet, Nolens, Bruinmelkanip, Krap, Travagliuo and Lucasse.
Those voted against: Verhey, ter Laan, van Gijn, PÜnacker Hordijk, Röell. Troelstra, Drucker, Schaper en Fox.
I want to try to write it as a function so I can indicate the starting and the ending word of each section and then make the dataframe with all names. If it is a function I could analyse multiple such pieces of text.
Thank you!