Is it your first time hearing the image preprocessing and Optical Character Recognition (OCR)? Don’t worry, in this tutorial you will obtain a basic understanding of image preprocessing and OCR, just in one short article.
After reading this tutorial, you will understand the image preprocessing, the basic knowledge of OCR, implementation of OCR in the Instagram app — created as a simplification implementation.
Keep reading the tutorial and don’t forget to follow each step!
Optical Character Recognition (OCR) is a tool in which trying to convert the character or text in an image to editable format in txt or csv…
After reading this article, you will know the information behind the Indonesian ID card number and how to validate it properly as the Government policy. The API that is created using Flask will automate the extraction and validation tasks.
This article is for education only. All the data is artificial and you can not get detailed information, such as name, picture, full address, job, blood type from the number of ID cards.
Enjoy, keep reading!
Indonesian ID card or in Bahasa Indonesia well-known as KTP (Kartu Tanda Penduduk) is a single identity for residents who are over 17 years…
After reading this short tutorial, you will understand the calculation of Cohen’s Kappa and Fleiss’ Kappa. Further, you also can inference the result and assign the level of agreement between raters.
Cohen’s Kappa is a metric used to measure the agreement of two raters. For instance, for two raters, they are asked to give 3 labels (A, B, or C) for 10 participants based on the participant’s skills. Using Cohen’s Kappa, we can measure the level of agreement. Theoretically, Cohen’s Kappa is often used:
After reading and doing step by step in this tutorial, you will get some new knowledge and experiences in Python script profiling, how to create profiling on your own script or function, and determine which part of the function takes more time to run.
When working in production, other than bugs occur in Python script, the run time execution will be one consideration. It is a performance issue when our data volume becomes bigger and bigger. The production script must be restructured and optimized to improve the performance.
What if the scripts are too complex to check line by line…
In this tutorial, we will talk about the analysis of user interaction within the Whatsapp group. How active are the Whatsapp group members? Or how passive are they? Who’s the main member of the group? Is there any grouping?
Social network analysis — SNA, is the descriptive analytics that tries to study and collect information about the interactions between individuals within a specific group (can be social media, etc.). SNA also becomes a powerful tool to recognize and identify the changes in group structures.
In the real world, the data might be having different data types, such as numerical and categorical data. To perform a certain analysis, for instance, clustering analysis, we should consider the data type in the data we have. The clustering algorithm commonly used in clustering techniques and efficiently used for large data is k-Means. But, it only works for the numerical data. It’s actually not suitable for the data that contains the categorical data type. So, Huang proposed an algorithm called k-Modes which is created in order to handle clustering algorithms with the categorical data type.
Policymakers are required to formulate comprehensive policies and be able to assess the areas that need improvement. Using the composite index, the indicators are aggregated and each area can be ranked to create an evaluation
Factor analysis is a linear statistical model that aims to describe a set of m variables in terms of a smaller number of p factors and to highlight the relationship between these variables. Factor analysis is similar to Principal Component Analysis (PCA).
The factor analysis helps the analyst to make an interpretation among the variables in the data into a set of factors. Factors consist…
Segmenting your customers can help you focus your marketing efforts, so you can increase profits and overall customer satisfaction. Learn steps and strategies to help get started!
Customer segmentation is a technique used to group customers into several segments with their own characteristics. Imagine in a company, the marketing team will launch a campaign. For efficiency because of limited resources, it must get revenue or other metrics as much as possible. Using customer segmentation, the marketing team can keep their focus on a valued customer or potential customer. That's a brief idea of customer segmentation!
According to Cooil et al…
When working with real data, the biggest problems are mostly in data pre-processing. It may vary, but matching can be one of the biggest challenges faced by a lot of analysts. For instance, when we are talking about George Washington and G Washington, of course, we are talking about one person, namely the first President of the United States. We are dealing with duplicate data. Luckily, researchers have developed the probabilistic data matching algorithm or well-known as fuzzy matching.
Probabilistic data matching often referred to as fuzzy string matching, is the algorithm to match a pattern between a string with…