Recently, I had to work on a Lambda function, to do an FTP/SFTP, as and when a file is dropped in an S3 bucket. Being new to Lambda, I had to explore information from various links, to get the work done. I tried to consolidate all my learning in this blog, so that you don’t need to redo the research, I had done.
For doing simple FTP, all we need is IP, Username and Password. To understand how lambda works, it is better to start with FTP, rather than to start directly with SFTP.
I regularly use Amazon EC2 Linux instance for my Machine Learning exercises. So, I setup my FTP host in the EC2 instance, for testing purpose. That turned out to be a good choice, for testing. I will explain the reason, in the further sections.
Steps to setup an FTP host in Amazon EC2 Linux instance is given here
I setup two buckets, one for source (where a file will be dropped), another for destination (where the file will be moved to – after performing FTP)
Give necessary permissions to the bucket
Now, let us proceed to create a Lambda function. If you are new to AWS, don’t get concerned with all these new terminologies. Just create an account in AWS and search for S3/Lambda/EC2. The AWS management console is very user friendly and all the help required are given in their own documentation.
To create a Lambda function, click on “Create Lambda Function” in the below screen.
Select “Blank Function”
Lambda works based on Triggers. So, we will have to set a trigger event. Click on highlighted area and then choose, “S3 Bucket”
While configuring the trigger, select the Source Bucket, which we created in the first step. Set the event type as required. In this example, I have selected it as “Object Created (All)”, which would trigger the Lambda function, for any file, which is dropped into that bucket, in anyway.
Under “Configure Functions”, give the relevant name and description. Select Python 2.7 as runtime.
Lambda Function Code
Select Code Entry type as “Edit Code inline”. For FTP, we would just require the “ftplib” package, which comes as default with python 2.7, we don’t need any extra package. So, the Python code could be inline.
Note – In case of SFTP, we would require external package. So, we need to have a different approach for packaging our Python code. We shall discuss that approach in detail in the SFTP section.
Copy and paste the code from here – FTP through Lambda
Lambda enables developers to have certain configurations, in the environment variables, to give flexibility in changing those configurations, instead of changing the code.
In this example, I have set the following variables.
“remoteDirectory” – if the lambda function has to perform a “change directory” command (“CD”), to a particular remote directory, user could input that directory path against this variable.
Lambda function handler and role
Leave the Handler as the default value – “lambda_function.lambda_handler”. For Role – get input from your AWS administrator, and assign/create a role. Make sure, the role has relevant permission to add and remove files from S3 bucket.
No need to worry on advance settings, unless, you will have to work within a custom VPC.
That’s it for FTP! – These are all the steps you require to do the FTP, through Lambda function. The code, which you had seen, does the following
- Transmit the file dropped in the source S3 bucket to the FTP host
- Take a backup of the data in the destination S3 bucket
- Delete the file dropped in the source S3 bucket
Steps for creating S3 buckets is all same at that of the procedure described in FTP section. The differences are in Python Code, packaging the code and the way we will have to setup the Lambda function.
Again, I used Amazon EC2 Linux instance as my SFTP host. Follow this link,to create the SFTP host in an EC2 instance.
Make sure, you have all the ports opened as mentioned here
For SFTP, I have setup three buckets, with relevant permissions.
- sftpbucket1 – Source bucket, where external application will drop files.
- sftpbucketbk – this is where the copy of the file will be stored
- sftpkeybucket – this is where we keep the private key in OpenSSH format. We shall see in detail about getting this key in coming sections.
To connect to an SFTP host, you would require a private key. I had created the SFTP host in my Amazon EC2 Linux instance. So, got the key (*.pem file) from AWS management console. Below given are the steps on how to use the Private key with Lambda.
For an EC2 instance – through the AWS management console, you could download the key pair file (*.pem). Save this file in your local PC. Then open PuttyGen, load this *.pem file. It then shows you the public key (which you will put in authorized_keys as mentioned in the above link). Save the private key file (*.ppk).
You could use this private key file (*.ppk), in order to do FTP from (or to) Filezilla client, with your EC2 instance.
But for the purpose of Lambda, we cannot directly use this private key in *.ppk format. It should be in OpenSSH format. Once again open PuttyGen, load the private key (*.ppk). Under “Conversion” menu, click on “Export OpenSSH key”. Then save the file with the extension as *.pub (the extension is not that important)
Upload this Private Key, which you have exported to OpenSSH format into the Key bucket (in this example, I have uploaded to “sftpkeybucket”)
Note: The reason, why we will have to do this conversion (from *.ppk to OpenSSH format), is that, for SFTP with Python, we use a library “paramiko”. This library does not understand the *.ppk format. It has to be in OpenSSH format.
Creating Lambda function
Process is same as that of FTP. The difference will be in the Lamda function code.
For FTP, we required basic Python packages, which are part of Lambda execution environment. But for SFTP, we need to use the library “Paramiko”. This is not present in the default Lambda execution environment. For these kind of scenarios, Lambda allows the developer to package all the dependencies along with the actual Python code, to be uploaded to Lambda, as a zip file.
Click here to get the code, to perform SFTP from Lambda.
Packaging the Python Code
As we are using Paramiko library, we will have to zip all the dependencies in a single file, and need to upload to Lambda. We need to remember that (currently) Lambda supports only Python 2.7. So all our dependencies should be collected only for Python 2.7. I find the easiest way to do this is using an EC2 linux instance. I had Amazon linux, which came with a prepackaged Python 2.7.
The steps for packaging is given here. (Search for the section ‘worker function”, in the link)
Based on the Linux version, you use, you might face some problems. Below are some of the issues, I faced, while following the steps.
- While installing Paramiko, it required a dependent package – libssl
- If you are using, Red Hat type of Linux use, “sudo apt-get install libssl-dev”
- If you are using, CentOS’ish type of Linux use, “sudo yum install openssl-devel”
- While zipping the contents, you will have to make sure, you are zipping the entire folder contents (for site-packages), including the hidden folder. To achieve that, if you are using CentOS’ish type of Linux, use, “zip –r /path/of/zip/file /virtual.zip .”
Once our zip file is ready, we will have to upload it to Lambda. For this example, I just had to add the package “paramiko” and its dependencies. This generated a 6.5 MB zip file. Uploading it to Lambda manually through AWS management console, is possible, but highly inefficient way of doing it. The best way to achieve this is through an S3 bucket.
In my case, as I was an Amazon EC2 Linux instance, for my FTP and SFTP host, I used the same to develop my python code as well. Once the code was ready and the zip file is prepared, I used “S3CMD” to upload the file to a specific S3 bucket.
It takes few minutes to setup S3CMD in your EC2 instance. But it is totally worth it. Follow this link, to setup S3CMD
As per the link, you will have install S3CMD (if you are using CentOS’ish Linux, replace “apt-get” with “yum”), then you need to do an one time configuration, with the command, s3cmd –configure.
During this step, you will require Access Key and Secret Key. You could get these keys as per the below steps.
Select “Security Credentials” in the below page.
Select “Create New Access Key”
Click “Show Access Key”
You should be looking at your “Access Key” and “Secret Access Key”. Copy and paste them where you are trying to configure your S3CMD.
Now that S3CMD is configured, you could transfer the files from EC2 instance to S3 bucket at lightning speed, by the following command.
s3cmd put virtual.zip s3://sftpbucketbk/virtual.zip
As per the above command, “virtual.zip” is the file in my EC2 instance. I’m transferring it to the s3 bucket “sftpbucketbk” with the name “virtual.zip”.
Once the zip file is uploaded in S3, go to Lambda Console management and select “Upload a file from Amazon S3” for Function package.
Under S3 link URL, give the URL for your S3 bucket (where you have uploaded your zip file)
Ex – https://s3.amazonaws.com/sftpbucketbk/virtual.zip
One more important step to do is, to configure the python function name in the Configuration tab, as below
As per the above screen shot, PythonSFTP is my Python script file name (PythonSFTP.py), which is in the zip file and the “lamda_handler” is the handler function name, within the script.
You could see that, I have added key_bucket and privatekeyfilename extra two environment variables, as we are going to authenticate using Private Key and not password.
I have also added “tempdir” as remoteDirectory, as an example.
These steps are sufficient to establish SFTP connectivity between Lambda and EC2 (or any other SFTP host).
If you are intending to use Amazon EC2 Linux instance, then you would require Putty to connect to your instance.
- Download Putty (PuttyGen comes along with it, which helps you with public and private keys)
- Download *.pem file
- Use Puttygen load the *.pem file, it will give you Public key and the private key. Click “Save Private Key” to save the *.ppk file.
- Now, open Putty
- Under Host name, give ec2-user@publicDNSOfYourAmazonInstance
Note: For Amazon Linux, the user name is ec2-user, for the rest, refer here
- Expand SSH (under connection) and click on Auth
- Here click Browse to load your private key (*.ppk file)
That is it, you will be connected to your EC2 instance.
In case if you want to test your SFTP connectivity, before testing it through Lambda, download FileZilla Client. (https://filezilla-project.org/download.php)
- In the host type sftp://publicIPOfYourInstance, then type your SFTP username.
- Under Edit menu, click Settings and then click on SFTP. There you upload your private key (*.ppk file)
- Then click QuickConnect.
In case, if we will have to change our code for any reason, we need to spend considerable amount of time re-typing multiple commands, to do a repackaging. To avoid that create a bash script file in your Linux instance, with the following commands.
zip virtual.zip PythonSFTP
zip –r /home/ec2-user/virtual/virtual.zip .
zip –r /home/ec2-user/virtual/virtual.zip .
s3cmd put virtual.zip s3://sftpbucketbk/virtual.zip
Save the file with any name (say buildpackage.sh). Now make this file as an executable with the following command
chmod +x buildpackage.sh
Now you could execute the script just by typing “./buildpackage.sh”. This will delete the existing zip file, re-zip the Python script, along with all the dependencies, and will transmit the file to the S3 bucket.