Azure Data Factory HTTP Connector Data Copy - Bank Of England Statistical Database - azure-data-factory-2

I'm trying to use the HTTP connector to read a CSV of data from the BoE statistical database.
Take the SONIA rate for instance.
There is a download button for a CSV extract.
I've converted this to the following URL which downloads a CSV via web browser.
[https://www.bankofengland.co.uk/boeapps/database/_iadb-fromshowcolumns.asp?csv.x=yes&Datefrom=01/Dec/2021&Dateto=01/Dec/2021 &SeriesCodes=IUDSOIA&CSVF=TN&UsingCodes=Y][1]
Putting this in the Base URL it connects and pulls the data.
I'm trying to split this out so that I can parameterise some of it.
Base
https://www.bankofengland.co.uk/boeapps/database
Relative
_iadb-fromshowcolumns.asp?csv.x=yes&Datefrom=01/Dec/2021&Dateto=01/Dec/2021 &SeriesCodes=IUDSOIA&CSVF=TN&UsingCodes=Y
It won't fetch the data, however when it's all combined in the base URL it does.
I've tried to add a "/" at the start of the relative URL as well and that hasn't worked either.
According to the documentation ADF puts the "/" in for you "[Base]/[Relative]"
Does anyone know what I'm doing wrong?
Thanks,
Dan
[1]: https://www.bankofengland.co.uk/boeapps/database/_iadb-fromshowcolumns.asp?csv.x=yes&Datefrom=01/Dec/2021&Dateto=01/Dec/2021 &SeriesCodes=IUDSOIA&CSVF=TN&UsingCodes=Y

I don't see a way you could download that data directly as a csv file. The data seems to be manually copied from the site, using their Save as option.
They have used read-only block and hidden elements, I doubt there would any easy way or out of the box method within ADF web activity to help on this.
You can just manually copy-paste into a csv file.

Related

airbyte ETL ,connection between http API source and big query

i have a task in hand, where I am supposed to create python based HTTP API connector for airbyte. connector will return a response which will contain some links of zip files.
each zip file contains csv file, which is supposed to be uploaded to the bigquery
now I have made the connector which is returning the URL of the zip file.
The main question is how to send the underlying csv file to the bigquery ,
i can for sure unzip or even read the csv file in the python connector, but i am stuck on the part of sending the same to the bigquery.
p.s if you guys can tell me even about sending the CSV to google cloud storage, that will be awesome too
When you are building an Airbyte source connector with the CDK your connector code must output records that will be sent to the destination, BigQuery in your case. This allows to decouple extraction logic from loading logic, and makes your source connector destination agnostic.
I'd suggest this high level logic in your source connector's implementation:
Call the source API to retrieve the zip's url
Download + unzip the zip
Parse the CSV file with Pandas
Output parsed records
This is under the assumption that all CSV files have the same schema. If not you'll have to declare one stream per schema.
A great guide, with more details on how to develop a Python connector is available here.
Once your source connector outputs AirbyteRecordMessages you'll be able to connect it to BigQuery and chose the best loading method according to your need (Standard or GCS staging).

Can Apache solr stores actual files which are uploaded on it?

This is my first time on Stack Overflow. Thanks to all for providing valuable information and helping one another.
I am currently working on Apache Solr 7. There is a POC I need to complete as I have less time so putting this question here. I have setup SOLR on my windows machine. I have created core and uploaded a PDF document using /update/extract from Admin UI. After uploading, I can see the metadata of the file if I query from the Admin UI using query button. I was wondering if I can get the actusl content of the PDF as well. I can see there is one tlog file gets generated under /data/tlog/tlog000... with raw PDF data but not the actual file.
So the question are,
1. Can I get the PDF content?
2. does Solr stores the actual file somewhere?
a. If it stores then where it does?
b. If it does not store then, is there a way to store THE FILE?
Regards,
Munish Arora
Solr will not sore the actual file anywhere.
Depending on your config it can store the binary content though.
Using the extract request handler Apache Solr relies on Apache Tika[1] to extract the content from the document[2].
So you can search and return the content of the pdf and a lot of other metadata if you like.
[1] https://tika.apache.org/
[2] https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html

Jedox - How to export SOAP Log into Excel file

I'm pretty new on Jedox, and I'm trying for internal use to export some specific "warning logs" (eg. "(Mapping missing)" ) into an excel/wss file.
And I don't how to do that...
Can you help me please.
Regards,
Usik
The easiest way to get these information is to use the Integrator in Jedox.
There you have the possibility to use a File Extract and then you can filter the information you are searching for.
After that it's possible to load these filtered information into a File.
The minimum steps you'll need are Connection -> Extract -> Transform -> Load.
Please take a look at the sample projects that are delivered with the Jedox software. In the example "sampleBiker", there are also file connections, extracts etc.
You can find more samples in:
<Install_path>\tomcat\webapps\etlserver\data\samples
I recommend to check the Jedox Knowledgebase.
The other way (and maybe more flexible way) would be to use, for example, a PHP macro inside of a Jedox Web report and read the log file you're trying to display.
If you've got a more specific idea what you'd like to do, please let me know and I'll try to give you an example how to do so.

Is there a way to import backups in NiFi?

Using NiFi v0.6.1 is there a way to import backups/archives?
And by backups I mean the files that are generated when you call
POST /controller/archive using the REST api or "Controller Settings" (tool bar button) and then "Back-up flow" (link).
I tried unzipping the backup and importing it as a template but that didn't work. But after comparing it to an exported template file, the formats are reasonably different. But perhaps there is a way to transform it into a template?
At the moment my current work around is to not select any components on the top level flow and then select "create template"; which will add a template with all my components. Then I just export that. My issue with this is it's a bit more tricky to automate via the REST API. I used Fiddler to determine what the UI is doing and it first generates a snippet that includes all the components (labels, processors, connections, etc.). Then it calls create template (POST /nifi-api/contorller/templates) using the snippet ID. So the template call is easy enough but generating the definition for the snippet is going to take some work.
Note: Once the following feature request is implemented I'm assuming I would just use that instead:
https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows
The entire flow for a NiFi instance is stored in a file called flow.xml.gz in the conf directory (flow.xml.tar in a cluster). The back-up functionality is essentially taking a snapshot of that file at the given point in time and saving it to the conf/archive directory. At a later point in time you could stop NiFi and replace conf/flow.xml.gz with one of those back-ups to restore the flow to that state.
Templates are a different format from the flow.xml.gz. Templates are more public facing and shareable, and can be used to represent portions of a flow, or the entire flow if no components are selected. Some people have used templates as a model to deploy their flows, essentially organizing their flow into process groups and making template for each group. This project provides some automation to work with templates: https://github.com/aperepel/nifi-api-deploy
You just need to stop NiFi, replace the nifi flow configuration file (for example this could be flow.xml.gz in the conf directory) and start NiFi back up.
If you have trouble finding it check your nifi.properties file for the string nifi.flow.configuration.file= to find out what you've set this too.
If you are using clustered mode you need only do this on the NCM.

Merge two Endeca Servers (Endeca 3.1) into one. Including their current data

Let me explain in more detail:
1st: I'm running endeca 3.1, so Endeca Server here refers to 3.0's Data Domain.
I'm required to use an Endeca Server currently present on Endeca (Downloaded a Demo VM). All the info on it, including, groups, attributes and data, must be merged into out Endeca Server. (It can also be the other way around, i could merge my Endeca Server into this one.)
So far, i've tried to do the following:
1) Clone the Endeca Server
2) use the putCollection sconfig operation to create a collection on it with the same name i have on mine.
3) Load configurations using the LoadCollection & LoadAttributes graphs from OEID POC Template 3.1. I point to the new collection on the Configuration.xls file.
This is where i encounter an issue. The LoadAttributes graph gets a T/O message from the server's WS. Then the config WSDL becomes inaccesible for a while. I can't go beyond this point.
I've been able to load data into the collection, but i need to load the attributes first.
THanks in advance for your replies.
Regards
There are a few techniques.
Have you tried exporting the data domain and then importing it?
You can use the endeca-cmd tools to export to a file, and then import from that file. This would enable you to add 2 datastores into one server.
If you want to combine 2 datastores then that is a different question.
The simplest approach in 3.1 if the data collections are small. Extract then as CSV (via a data-table), convert to XLS and add them via self provisioning into separate collections within a single data store. If you are running in the VM this is potentially the easiest approach.
This can also be done using Integrator.
You don't need to load the attributes unless you are using multi-value types. You can call against the conversation web-service to extract data and then load it using 'bulk-load' I would not worry too much about creating the attributes unless this becomes essential due to their type or complexity. If you cannot call against the conversation web-service, then again extract as csv and load using Integrator.