Efficiently working with Spark partitions

It’s been quite some time since my last article, but here is the second one of the Apache Spark serie. For those of you that are new to spark, please refer to the first part of my previous article which introduces the framework and its usages. In this article, I will show how to execute specific code on different partitions... [Read More]
Tags: spark scala

Writing your own Gaussian Mixture Model Spark Estimator

Apache Spark is an open source framework for distributed computation. It is particularly adapted for Big Data, effectively speeding up the data analysis and data processing. Spark is particularly known for its very structured architecture allowing customization. One of the key feature of Spark is its Estimators, which is an abstraction of any learning algorithm. In order to get a... [Read More]

Scaling the A3C algorithm to multiple machines with Tensorflow.JS

As I have been working on reinforcement learning and it’s application to webcrawlers, I have came across the A3C algorithm. The original A3C approach had its flaws and its drawbacks when applied to the environment I set up for my needs. This blog post presents a different approach to the A3C algorithm, allowing us to scale it to multiple machines... [Read More]

Implementing SARSA(λ) in Python

This post show how to implement the SARSA algorithm, using eligibility traces in Python. It is part of a serie of articles about reinforcement learning that I will be writing. Please note that I will go in further details as soon as I can. This is the first version of this article and I simply published the code, but I... [Read More]

Browser fingerprints in a nutshell

Internet privacy has been a recurrent subject over the last years, as multiple social media, such as Facebook, Twitter and others, have encountered themselves trapped in a tonload of controversies and have been under the spotlights since then. This is basically showing a trend : the lambda user is waking up and looking at his internet privacy from a new... [Read More]
Tags: privacy

Visualizing convolutional neural networks outputs

Convolutional neural networks (CNNs) are the type of neural networks the more likely to allow us to understand what is happening internally, since, as opposite to many other type of neural nets (I am thinking to GANs for example), CNNs are basically a representation of visual concepts. Either for the purpose of debugging, or the personal satisfaction of visualizing the... [Read More]

Deploy your blog using Ghost and Docker on AWS

When I first heard about the Ghost platform for blogging, I was quite impressed by it’s simplicity but at the same time, the amount of steps I had to go through before getting a running Ghost process on my server were a bit too much. I didn’t want something that would make me lose my time but rather something simple... [Read More]
Tags: aws ghost-tag

A bit of a boring introduction

Here we are. I’ve finally stepped up and decided to set up my very own  blog. I’ve never really understood the concept of blogging until I  discovered that every meaningful stuff I have ever learned was mostly  coming from blog articles written by mainstream developers, engineers,  or researchers. [Read More]