Photo by Nathan Dumlao on Unsplash

Hands-on Tutorial

The basic theory and implementation of walk-forward optimization as a cross-validation technique for time-series data

After reading this short article, you will absolutely understand the basic theory and implementation of walk-forward optimization for time-series data modelling. The common questions like why do the scientist must implement the walk-forward optimization on their time-series data will be answered.

Furthermore, in the last section, we will also demonstrate the comparison between walk-forward optimization with other cross-validation techniques commonly used for cross-section data like k-fold. Can the implementation of walk-forward make a significant impact on the model performance?

Keep reading and enjoy the trips!

Walk-forward optimization

Before talking deeper about walk-forward optimization, let’s talk about the time-series data. Why does it…


Photo by Michael Dziedzic on Unsplash

Hands-on Tutorial

The basic theory and tutorial on how to simulate the central limit theorem and law of large numbers using R

Getting started with Statistics simulation using R

In Statistics, the central limit theorem and law of a large number have an important role, for instance in hypothesis testing. The central limit theorem states that the sample means will be normally distributed, it doesn’t depend on the population distribution, skewed or not. In this tutorial, we will learn the simulation using 5 different distributions.

Central limit theorem

The central limit theorem states that the distribution of the sample means will be normally distributed in which the population mean is μ and the standard deviation is σ when we take the large random samples from the population (with replacement).

According to J…


Photo by Nong Vang on Unsplash

Hands-on Tutorial

Understand the manual calculation of binary search and its implementation using Python

After reading this article, you will understand the comparison between linear search and binary search algorithms, how to perform searching tasks using linear and binary search algorithms, and why the binary search is known as the fastest searching algorithm.

To illustrate the mechanism of the searching algorithm, a detailed illustration is made. It aims to help you understand the idea behind binary search algorithms.

Keep reading and enjoy exploring the programming world!

Searching problem

In programming, a lot of algorithms need to perform a searching task. For instance, to compare elements between two lists, we must check the element one by one…


Photo by Jay Zhang on Unsplash

Hands-on Tutorial

How typo corrector can improve the string matching algorithm for matching the best string in the lexicon (master data)

After reading this short article, you will understand the implementation of graph theory for improving the string matching algorithm. For any typos in our writing, the typo corrector not only calculates the string similarities between the input and in the master database but also considers the distance between each character.

To get the distance of characters on the keyboard, the graph theory is proposed. The alphabets on the keyboard with a QWERTY layout will be restructured. Keep reading and find out the idea on how to implement graph theory for typos on the keyboard.

Problem identification

Typo might happen in writing. If…


Photo by Souvik Banerjee on Unsplash

Hands-on Tutorial

Implementation of Optical Character Recognition (OCR) on Instagram during Coronavirus pandemic

Is it your first time hearing the image preprocessing and Optical Character Recognition (OCR)? Don’t worry, in this tutorial you will obtain a basic understanding of image preprocessing and OCR, just in one short article.

After reading this tutorial, you will understand the image preprocessing, the basic knowledge of OCR, implementation of OCR in the Instagram app — created as a simplification implementation.

Keep reading the tutorial and don’t forget to follow each step!

Optical Character Recognition

Optical Character Recognition (OCR) is a tool in which trying to convert the character or text in an image to editable format in txt or csv…


Photo by Daria Salikova on Unsplash

Hands-on Tutorial

How to extract the information and check the validity of an Indonesian ID card number using API

After reading this article, you will know the information behind the Indonesian ID card number and how to validate it properly as the Government policy. The API that is created using Flask will automate the extraction and validation tasks.

Disclaimer!
This article is for education only. All the data is artificial and you can not get detailed information, such as name, picture, full address, job, blood type from the number of ID cards.

Enjoy, keep reading!

What’s about the Indonesian ID card?

Indonesian ID card or in Bahasa Indonesia well-known as KTP (Kartu Tanda Penduduk) is a single identity for residents who are over 17 years…


Photo by Hannah Busing on Unsplash

Hands-on Tutorial

Deep understanding about Cohen’s Kappa and Fleiss’ Kappa on how to measure the agreement between raters

After reading this short tutorial, you will understand the calculation of Cohen’s Kappa and Fleiss’ Kappa. Further, you also can inference the result and assign the level of agreement between raters.

Cohen’s Kappa

Cohen’s Kappa is a metric used to measure the agreement of two raters. For instance, for two raters, they are asked to give 3 labels (A, B, or C) for 10 participants based on the participant’s skills. Using Cohen’s Kappa, we can measure the level of agreement. Theoretically, Cohen’s Kappa is often used:

  • To measure the level of agreement between two raters on classifying the objects into a given…

Photo by Bozhin Karaivanov on Unsplash

Hands-on Tutorial

Determine which part of your Python codes take more time to run

After reading and doing step by step in this tutorial, you will get some new knowledge and experiences in Python script profiling, how to create profiling on your own script or function, and determine which part of the function takes more time to run.

Introduction to Python script profiling

When working in production, other than bugs occur in Python script, the run time execution will be one consideration. It is a performance issue when our data volume becomes bigger and bigger. The production script must be restructured and optimized to improve the performance.

What if the scripts are too complex to check line by line…


Photo by Marc-Olivier Jodoin on Unsplash

Hands-on Tutorial

How to create and represent the data for social network analysis using Python

In this tutorial, we will talk about the analysis of user interaction within the Whatsapp group. How active are the Whatsapp group members? Or how passive are they? Who’s the main member of the group? Is there any grouping?

Social network analysis

Social network analysis — SNA, is the descriptive analytics that tries to study and collect information about the interactions between individuals within a specific group (can be social media, etc.). SNA also becomes a powerful tool to recognize and identify the changes in group structures.

The SNA is widely used and developed in social media to find the pattern or anomalies…

Audhi Aprilliant

Data Scientist. Tech Writer. Statistics, Data Analytics, and Computer Science Enthusiast. Portfolio & social media links at http://audhiaprilliant.github.io/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store