Writing an ETL pipeline for a multi-source scraper
One challenge of making a scrapper that gets its data from multiple sources is thinking of an architecture that will help extract transform and load the data to your database, and for that exact purpose the ETL layer makes its apparition Whats an ETL in the context of a scrapper ? Maybe youve heard of ETL before and get to thinking “well usually an ETL is a pipeline that would typically take multiple dbs transform the data in a staging area and then load it in a data warehouse” -> and you’d be right this use case is also an ETL Layer, but in our context with a single database to fill we will ...
Puppeteer advanced scrapping
In this article we will talk about ways to bypass anti-bot firewalls using techniques such as stealth plugins, ip-rotation and residential proxy provisionning, we will not be talking about the scrapping in itself but more about the architecture that will allow use to stay under the radar, if you want to see how to extract data from a website using puppeteer go read This Article BE ADVISED !! I do not encourage using those techniques to scrape illegally from webistes, this article only exists for the sole purpose of educational content, be mindful of how you use this knowledge, you are responsible for your own actions. ...
My Introduction to Terraform
How did I end up using Terraform ? Some months ago, being the simple-minded rat that I am, I had never heard of Terraform or even Infrastructure as Code (IaC). The reason was simple: the projects I was building in the cloud didn’t need top-notch infrastructure. A few AWS EC2 instances did the trick, so I set everything up by hand or with the AWS CLI. But things changed when a client approached me with a project that was already up and running — and guess what? I stumbled upon a shared-infra directory filled with .tf files and a mysterious terragrunt.hcl. A full IaC setup already in place. ...
Introduction to test driven development (TDD) explained by yours truly
Alright so basically I want to make a feature for my client, but i want to make it right and put all the chances on my side, and for that, what better than some test-driven-development ? So we already have a testing framework which is mocha and we’ll be using along with chai because the two go together so well, we will be using it to describe a batter of tests for the controller of our new feature. ...
Creating CI/CD by utilizing Docker and GHCR
In the context of a school project i was assigned the task of making a CI/CD pipeline that will in a first time deliver the project on a distant VPS (Digital Ocean’s droplet) so that we can have a staging environment where the rest of the team will be able to see the progress of the application. So here below a quick walkthrough on how i made it possible using docker and ghcr ...