Sign in

Statistics and Data Analytics Enthusiast. Portfolio & social media links at

Photo by Souvik Banerjee on Unsplash

Hands-on Tutorial

Implementation of Optical Character Recognition (OCR) on Instagram during Coronavirus pandemic

Is it your first time hearing the image preprocessing and Optical Character Recognition (OCR)? Don’t worry, in this tutorial you will obtain a basic understanding of image preprocessing and OCR, just in one short article.

After reading this tutorial, you will understand the image preprocessing, the basic knowledge of OCR, implementation of OCR in the Instagram app — created as a simplification implementation.

Keep reading the tutorial and don’t forget to follow each step!

Optical Character Recognition

Optical Character Recognition (OCR) is a tool in which trying to convert the character or text in an image to editable format in txt or csv…

Photo by Daria Salikova on Unsplash

Hands-on Tutorial

How to extract the information and check the validity of an Indonesian ID card number using API

After reading this article, you will know the information behind the Indonesian ID card number and how to validate it properly as the Government policy. The API that is created using Flask will automate the extraction and validation tasks.

This article is for education only. All the data is artificial and you can not get detailed information, such as name, picture, full address, job, blood type from the number of ID cards.

Enjoy, keep reading!

What’s about the Indonesian ID card?

Indonesian ID card or in Bahasa Indonesia well-known as KTP (Kartu Tanda Penduduk) is a single identity for residents who are over 17 years…

Photo by Hannah Busing on Unsplash

Hands-on Tutorial

Deep understanding about Cohen’s Kappa and Fleiss’ Kappa on how to measure the agreement between raters

After reading this short tutorial, you will understand the calculation of Cohen’s Kappa and Fleiss’ Kappa. Further, you also can inference the result and assign the level of agreement between raters.

Cohen’s Kappa

Cohen’s Kappa is a metric used to measure the agreement of two raters. For instance, for two raters, they are asked to give 3 labels (A, B, or C) for 10 participants based on the participant’s skills. Using Cohen’s Kappa, we can measure the level of agreement. Theoretically, Cohen’s Kappa is often used:

  • To measure the level of agreement between two raters on classifying the objects into a given…

Photo by Bozhin Karaivanov on Unsplash

Hands-on Tutorial

Determine which part of your Python codes take more time to run

After reading and doing step by step in this tutorial, you will get some new knowledge and experiences in Python script profiling, how to create profiling on your own script or function, and determine which part of the function takes more time to run.

Introduction to Python script profiling

When working in production, other than bugs occur in Python script, the run time execution will be one consideration. It is a performance issue when our data volume becomes bigger and bigger. The production script must be restructured and optimized to improve the performance.

What if the scripts are too complex to check line by line…

Photo by Marc-Olivier Jodoin on Unsplash

Hands-on Tutorial

How to create and represent the data for social network analysis using Python

In this tutorial, we will talk about the analysis of user interaction within the Whatsapp group. How active are the Whatsapp group members? Or how passive are they? Who’s the main member of the group? Is there any grouping?

Social network analysis

Social network analysis — SNA, is the descriptive analytics that tries to study and collect information about the interactions between individuals within a specific group (can be social media, etc.). SNA also becomes a powerful tool to recognize and identify the changes in group structures.

The SNA is widely used and developed in social media to find the pattern or anomalies…

Photo by Aaron Burden on Unsplash

Hands-on Tutorial

The explanation of the theory and its application in real problems

The basic theory of k-Modes

In the real world, the data might be having different data types, such as numerical and categorical data. To perform a certain analysis, for instance, clustering analysis, we should consider the data type in the data we have. The clustering algorithm commonly used in clustering techniques and efficiently used for large data is k-Means. But, it only works for the numerical data. It’s actually not suitable for the data that contains the categorical data type. So, Huang proposed an algorithm called k-Modes which is created in order to handle clustering algorithms with the categorical data type.

The modification of k-Modes…

Photo by Luke Michael on Unsplash

Hands-on Tutorial

Create an education index from Indonesia’s Central Statistics Agency data 2020

Policymakers are required to formulate comprehensive policies and be able to assess the areas that need improvement. Using the composite index, the indicators are aggregated and each area can be ranked to create an evaluation

The idea behind the Factor Analysis (FA)

Factor analysis is a linear statistical model that aims to describe a set of m variables in terms of a smaller number of p factors and to highlight the relationship between these variables. Factor analysis is similar to Principal Component Analysis (PCA).

The factor analysis helps the analyst to make an interpretation among the variables in the data into a set of factors. Factors consist…

Photo by Jessica Ruscello on Unsplash

Hands-on Tutorial

How to understand and create customer segmentation using the RFM model and Pareto principle

Segmenting your customers can help you focus your marketing efforts, so you can increase profits and overall customer satisfaction. Learn steps and strategies to help get started!

Customer segmentation

Customer segmentation is a technique used to group customers into several segments with their own characteristics. Imagine in a company, the marketing team will launch a campaign. For efficiency because of limited resources, it must get revenue or other metrics as much as possible. Using customer segmentation, the marketing team can keep their focus on a valued customer or potential customer. That's a brief idea of customer segmentation!

According to Cooil et al…

Photo by Fabrizio Verrecchia on Unsplash

Hands-on Tutorial

How to accelerate the computation time of fuzzy string matching from hours to seconds

When working with real data, the biggest problems are mostly in data pre-processing. It may vary, but matching can be one of the biggest challenges faced by a lot of analysts. For instance, when we are talking about George Washington and G Washington, of course, we are talking about one person, namely the first President of the United States. We are dealing with duplicate data. Luckily, researchers have developed the probabilistic data matching algorithm or well-known as fuzzy matching.

What is fuzzy string matching?

Probabilistic data matching often referred to as fuzzy string matching, is the algorithm to match a pattern between a string with…

Audhi Aprilliant

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store