How to fetch a file using nifi from a fileshare location? - azure-storage

I am trying to fetch some files from a fileshare location in a azure storage account using NiFi. I tried the fetchFTP processor but it is not being able to connect to the url of the fileshare location . I tried putting the correct parameters, but I am getting an error.
Failed to fetch file on remote host due to java.net.UknownHostException:

Cannot comment yet...
2 things:
UnknownHostException strongly indicates that the entered hostname is wrong or cannot be resolved by the nifi host.
Azure files in general does not work with FTP because it works on the SMB protocol or REST api. You could mount it directly on the nifi machines.See here, how to mount azure files on a nifi machine. Has to be done on every node. Once that works, you can use ListFiles and GetFile processors in nifi.

Related

CSV transfer from SFTP server to S3 via WinSCP

My work uses WinSCP for SFTP transfers. We have some data coming in this way each week and I would like to get it into an S3 bucket. We want to automate this transferring by use of a cron job or some other way like this.
I know there are AWS tools, but they cost money and money can't be spent. We also do not have an ETL tool like Alteryx, otherwise I would use it. Nothing on the internet gives a lot of detail about transfering files from SFTP server to another server. Mostly reading how to transfer from server to local.
Below is the code I have found.
Can this WinSCP commands be used to transfer to S3 bucket somehow at the 'put:'? (I cannot use the generator like other posts have said because I do not have access to our AWS or any buckets, yet.) This is all about proving a concept.
# Connect to SFTP server using a password
open sftp://user:password#example.com/ -hostkey="ssh-rsa 2048 xxxxxxxxxxx...="
# Upload file (THIS IS WHERE I WOULD WANT S3 PATH SYNTAX)
put d:\examplefile.txt /home/user/
# Exit WinSCP
Exit
Once I have this command we can then create a Windows schedule task, from what I read. This would automate where the file is and we can then do more with the file where SFTP servers limit us.
If I understand the question correctly, you are asking how to transfer files directly from SFTP server to S3 using a script running on yet another machine.
It's not possible (unless AWS has a feature for that, but then it won't be free). You have to download the files from SFTP server and then upload them to S3.
With WinSCP scripting, you can do it with a script like:
open sftp://username:password#sftp.example.com/
get /sftp/path/*
exit
open s3://accesskey:secretkey#s3.amazonaws.com/
put * /bucket/
exit
You can mount your bucket on your server with "s3fs" "fuse" like an normal harddisc.

s3 to ftp server without downloading to ec2

Is there is way (programmatically) to transfer file from s3 bucket to an external ftp server without downloading it to an ec2 instance ?
More details:
I have a Django server running on EC2 which serves an angular web app.
User uploads a file to S3 bucket using my web app and once the upload is complete the web app sends a POST request containing the file object s3 url.
The Django server upon receiving the POST request may need to copy the file (uploaded to s3) to an external ftp server. The target ftp server may be different depending upon the user who uploaded the file (each user group may have her own ftp server).
I understand that upon receiving POST request, Django server can download the file from s3 and then upload it to the appropriate target ftp server.
My question is: Can I reduce overhead on my EC2 instance in step 4 by somehow initiating a transfer from s3 to the target ftp server and get a callback/notification when that transfer completes (success or error).
Thanks.
You can create a lambda function to do the same.
A complete reference of the implementation is discussed here.
https://pythonvibes.wordpress.com/2016/12/09/ftp-and-sftp-through-lambda/
Hope it helps.

how to move files from remote server to s3 at the command line

I have a lot of big files on a remote server and I want to move them into S3. I want to do it at the command line or with a bash script (e.g., I do NOT want to use a gui app like cyberduck) so that I can automate/replicate efforts.
I have tried to mount my remote server onto my local machine using Osxfuse and sshfs and then push it to s3 using s3cmd. This does work but I keep running into errors (connection being lost for no apparent reason; mount errors, etc.).
Is this the best way to do it? Does anyone know a better way to do it?
Thanks,
A
You can use minio client aka mc to do the same.
$ mc cp --recursive localDir/ s3/remoteBucket
In case of network disconnection, mc will provide you an option to resume the upload.
Is your remote server in ec2? Your current setup requires two copies (first to pull data to your local machine via sshfs, then to push to s3 via s3cmd), if you run s3cmd on your remote server directly you can reduce that to one.
If you want to mount s3 as a filesystem, you can also use tools like goofys or s3fs. Again you should do that on your remote server to avoid extra copies.

How to upload files directly to Amazon S3 from a remote server?

Is it possible to upload a file to S3 from a remote server?
The remote server is basically a URL based file server. Example, using http://example.com/1.jpg, it serves the image. It doesn't do anything else and can't run code on this server.
It is possible to have another server telling S3 to upload a file from http://example.com/1.jpg
upload from http://example.com/1.jpg
server -------------------------------------------> S3 <-----> example.com
If you can't run code on the server or execute requests then, no, you can't do this. You will have to download the file to a server or computer that you own and upload from there.
You can see the operations you can perform on amazon S3 at http://docs.amazonwebservices.com/AmazonS3/latest/API/APIRest.html
Checking the operations for both the REST and SOAP APIs you'll see there's no way to give Amazon S3 a remote URL and have it grab the object for you. All of the PUT requests require the object's data to be provided as part of the request. Meaning the server or computer that is initiating the web request needs to have the data.
I have had a similar problem in the past where I wanted to download my users' Facebook Thumbnails and upload them to S3 for use on my site. The way I did it was to download the image from Facebook into Memory on my server, then upload to Amazon S3 - the full thing took under 2 seconds. After the upload to S3 was complete, write the bucket/key to a database.
Unfortunately there's no other way to do it.
I think the suggestion provided is quite good, you can SCP the file to S3 Bucket. Giving the pem file will be a password less authentication, via PHP file you can validate the extensions. PHP file can pass the file, as argument to SCP command.
The only problem with this solution is, you must have your instance in AWS. You can't use this solution if your website is hosted in other Hosting Providers and you are trying to upload files straight to S3 Bucket.
Technically it's possible, using AWS Signature Version 4, Assuming your remote server is the customer in the image below, you could prepare a form in the main server, and send the form fields to the remote server, for it to curl it. Detailed example here.
you can use scp command from Terminal.
1)using terminal, go to the place where there is that file you want to transfer to the server
2) type this:
scp -i yourAmazonKeypairPath.pem fileNameThatYouWantToTransfer.php ec2-user#ec2-00-000-000-15.us-west-2.compute.amazonaws.com:
N.B. Add "ec2-user#" before your ec2blablbla stuffs that you got from the Ec2 website!! This is such a picky error!
3) your file will be uploaded and the progress will be shown. When it is 100%, you are done!

What is the fastest way to upload the big files to the server

I have got dedicated server and file about 4 GB to upload on the server. What is the fastest and most save way to upload that file to the server?
FTP may create issues if the connection will be broken.
SFTP will have the same issue as well.
Do you have your own computer available through internet public IP as well?
In that case you may try to set up a simple HTTP server (if you have Windows - just set up the IIS) and then use some download manager on dedicated server (depends from OS) to download the file through HTTP (it can use multiple streams for that) or do this through torrent.
There're trackers, like http://openbittorrent.com/, which will allow you to keep the file on your computer and then use some torrent client to upload the file to the dedicated server.
I'm not sure what OS your remote server is running but I would use wget it has a --continue from the man page:
--continue
Continue getting a partially-downloaded file. This is useful when
you want to finish up a download started by a previous instance of
Wget, or by another program. For instance:
wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
If there is a file named ls-lR.Z in the current directory, Wget
will assume that it is the first portion of the remote file, and
will ask the server to continue the retrieval from an offset equal
to the length of the local file.
wget binaries are available for GNU/Linux / Windows / MacOSX / dos:
http://wget.addictivecode.org/FrequentlyAskedQuestions?action=show&redirect=Faq#download