Feed: Planet big data.
Author: Praveen Sripati.
In this blog, we would be analyzing the number of users coming to a website
from different ip addresses. Here are the steps at a high level which we
would be exploring in a bit more detail. This is again a lengthy post where would be using a couple of AWS services (ELB, EC2, S3 and Athena) and see how they work together.
– Create two Linux EC2 instance with web servers with different content
– Create an Application Load Balancer and forward the requests to the above web servers
– Enable the logging on the Application Load Balancer to S3
– Analyze the logging data using Athena
To continue further, the following can be done (not covered in this article)
– Create a Lambda function to call the Athena query at regular intervals
– Auto Scale the EC2 instances depending on the resource utilization
– Remove the Load Balancer data from s3 after a certain duration
Step 1: Create two Linux instances and install web servers as mentioned in this blog. In the /var/www/html folder have the files as mentioned below. Ports 22 and 80 have to be opened for accessing the instance through ssh and for accessing the web pages in the browser.
server1 – index.html
server2 – index.hml and img/someimage.png
Make sure that ip-server1, ip-server2 and ip-server2/img/someimage.png are accessible from the web browser. Note that the image should be present in the img folder. The index.html is for serving the web pages and also for the health check, while the image is for serving the web pages.
Step 2: Create the Target Group.
Step 3: Attach the EC2 instances to the Target Group.
Step 4: Change the Target Group’s health checks. This will make the instances healthy faster.
Step 5: Create the second Target Group. Associate server2 with the target-group2 as mentioned in the flow diagram.
Step 6: Now is the time to create the Application Load Balancer. This balancer is relatively new when compared to the Classic Load Balancer. Here is there difference between the different Load Balancers. The Application Load Balancer operates at the layer 7 of the OSI and supports host-based and path-based routing. Any web requests with ‘/img/*’ pattern would be sent to the target-group2, rest by default would be sent to target-group1 after completing the below settings.
Step 7: Associate the target-group1 with the Load Balancer, the target-group2 will be associated later.
Step 8: Enable access logs on the Load Balancer by editing the attributes. The specified S3 bucket for storing the logs will be automatically created.
Few minutes after the Load Balancer has been created, the instances should turn into a healthy state as shown below. If not, then maybe one of the above steps has been missed.
Step 11: Once the Load Balancer has been accessed from different browsers a couple of times, the log files should be generated in S3 as shown below.
Step 12: Now it’s time to create tables in Athena and then map it to the data in S3 and query the tables. The DDL and the DML commands for Athena can be found here.
We have seen how to create a Load Balancer, associate Linux web servers with them and finally check how to query the log data with Athena. Make sure that all the AWS resources which have been created are deleted to stop the billing for them.
That’s it for now.