, Classification of Vulnerability Levels of Coastal Areas on the …

Indonesia has many beaches that attract the attention of tourists, including the north coast of Java. In addition to visual appeal, there are also other potentials such as settlements, agriculture, fisheries, ports, ponds, and other resources. However, there is also a threat to coastal damage caused by, among other things, wave action, tides, abrasion, and tidal flooding. For this reason, in developing coastal areas on the north coast of Java, it is necessary to consider the potential for damage to the coast based on the physical condition of the coast and a system that can classify the vulnerability levels of coastal areas is also needed. The Coastal Vulnerability Index (CVI) can be determined and classified using for example the Gornitz formula, or using data driven model and machine learning based on the coastal parameter data. This study demonstrates how the K-Nearest Neighbor, also known as K-NN algorithm, can be used to classify the level of vulnerability of the coastal areas. This study uses 290 points (locations) along the northern coasts of Java. The parameters that determine the coastal vulnerability are mean sea level (MSL), mean significant wave height (MSWH), mean tidal range (MTR), shoreline changes, landforms and slopes. In this study, the classification of coastal vulnerability levels is classified into 4, namely “low, moderate, high and very high”. The K-NN system uses 80% of the data for training and 20% for testing, with the value of K = 1 to 10. The test results show that the K-NN method is capable of classifying the vulnerability levels of the North Coast of Java. From the test results for values of K = 1 to K = 10, and by randomizing the training data and test data gives an average accuracy rate of 86.21% to 97.13%, with the best K value obtained at K = 2. This is an open access article under the CC BY-SA license


INTRODUCTION
Indonesia is an archipelago located between the Asian continents and the Australian continent, the Indian Ocean and the Pacific Ocean with coordinates of 6°N -11°S and 95°E -141°W [1].Indonesia has a coastline of more than 95,181 kilometers, making it the world's second longest behind Canada [2] [3].Beaches in Indonesia, especially the north coast of Java, have a lot of attractions, especially from a visual perspective.In addition, it has many other potentials such as residential areas, tourism, agriculture, fisheries, ports, ponds, and other resources.
However, behind this potential, the threat to coastal damage is getting bigger from time to time, for example, on the northern coasts of Java.The threat is in the form of coastal damage caused by wave action, tides, abrasion, and tidal flooding [4] [5].The threat of coastal damage must be taken into considerations for sustainable development of coastal areas.The potential for coastal damage can be classified based on the physical condition of the beach which is described in physical coastal data such as ground level, sea level rise, significant wave height, coastal geomorphology, and so on [6].
The definition of vulnerability in general is the level of a system that leads to a decrease in resilience as a result of external influences.Coastal vulnerability describes the condition of a coastal area which can be subject to technical damage such as abrasion or accretion, which if it occurs repeatedly will result in a decrease in the land area along the coast [7].The level of coastal vulnerability can be classified into 4 categories, namely low, moderate, high and very high.This classification is based on the coastal vulnerability index (CVI), which can be obtained using the Gornitz method (explained later in the article), using the previously mentioned parameters: mean sea level (MSL), mean significant wave height (MSWH), mean tidal range (MTR), shoreline changes, landforms and coastal slopes.However, this study tries to demonstrate whether the K-nearest neighbor (K-NN) algorithm can perform to classify the vulnerability level, in this case, the northern coasts of Java.
The rapid development of computing technology facilitates the utilization of machine learning in all areas including the use of K-NN.Data mining discovers new patterns in very large datasets, including techniques from statistic, machine learning, database systems, and artificial intelligence [14] [15], which are used to find clear relationships and produce conclusions that were not previously known or understood, but useful for data owners [10].In general, data mining is used for two purposes, namely descriptive and predictive.Several groups of methods that are often used in data mining include [11]: prediction, classification, estimation, clustering and association.Meanwhile, the Knowledge Discovery in Databases (KDD) includes several stages, from the database, data warehouse, the mining process, patterns and knowledge [8].
This study uses a classification process that finds models by describing and distinguishing data classes, with the goal of allowing object classes to predict previously unknown classes [11].This classification process has several main components, namely: Classes as dependent variables that function as labels/targets from the classification results, Predictors as characteristics of data attributes that are classified as independent variables in the model, Training datasets as the complete dataset containing the classes and predictions to be trained, so that the model can be grouped into the correct or appropriate class, and the last is the Test data sets, that contain a set of new data against which the model is classified and whose purpose is to determine the accuracy of the established model [10].
The use of K-NN has been proven in many cases, including conducting K-NN research to classify consumer interest in an insurance company [12], determining the level of forest fire hazard using 252 data, with an accuracy rate of 80.16% [13].In the world of education, [14] conducted research aimed at assisting schools in determining which students will be accepted based on zoning and non-zoning based on the regulations of the Minister of Education and Culture.This research was successful with an accuracy rate of 83.36% at K = 5.Other examples of research using K-NN have also been carried out, which aims to classify the level of sales of Lombok Vape On per month [15], regarding the selection of student thesis topics [16] and the news classification based on topics such as health, sports, technology, and so on [17].
The KNN algorithm is widely used in data mining for classification, or estimation and prediction [13].The K-NN algorithm is a supervised learning technique that labels data and groups it by label, storing all training data, and delaying the formation of a classification model until the test data is provided predictions.This method classifies data based on their similarity or proximity to the majority of a test data's neighbors.Data points that are close to each other are called "neighbors" [18].The number of the nearest neighbor data is determined and expressed in K and the best K value can be determined by the parameter optimization [19].The lower the number of K chosen, the more data there is.However, the bigger the K number used, the larger the data dimension [20].There are a number of techniques for computing neighbor distance, which is an important part of K-NN algorithms, such as Manhattan or city block distance and Minkowski distance, and the most widely used is the Euclidean distance, which is a technique of comparing the differences of two data [21,22,23,24,25,26].
To summarize, this study aims to develop a system that can classify the vulnerability levels of coastal areas on the north coast of Java by implementing the K-Nearest Neighbor (K-NN) algorithm, as well as obtaining the level of accuracy and reliability of the system.This research is expected to assist users or policy-making authorities in classifying the vulnerability levels of coastal areas in general if the physical condition parameters of the coast are known, so as to assist the authorities in planning for coastal area development.

MATERIALS AND METHOD
The materials used in this study were coastal physical parameter data taken at 290 points (locations) along the north coast of Java.These points line up from the Serang, Banten to Sidoarjo, East Java, as shown in Figure 1.
The equipment used in this study included a computer system with Windows 10 Pro OS with Intel Core i7 gen 10 CPU @1.8GHz, 16GB RAM, 500GB HDD with a 14" display.The software used is MS Excel 2021, Matlab v14 and Global Mapper v13.
The physical parameters of the coast that are considered important as a determinant of coastal vulnerability are not always the same.This usually depends on the needs and location of the study.In the case of the north coast of Java, where a lot of activities occur, erosion of the coastlines must be considered.The process of damage to the northern coast of Java is attributed to the wave height, sea level rise, tidal range, and coastal slope [6].Several physical parameters of the coast are used to estimate coastal vulnerability.The physical data of the coast used in this study are tabulated in Table 1 [27,28,29,30,31,32].The same parameters have also been used by the USGS in estimating the level of vulnerability of beaches in the United States to the sea level rise.Meanwhile, [33] also used 6 parameters that were almost the same in determining the vulnerability index of beaches in Bangladesh, namely the average tide range, height of the significant wave, change in shoreline, cyclone track density, slope of the coastal topography, and the rise in sea level.
After the physical parameter data are collected, the next step is to transform the data, where the physical values of the coastal parameters are classified into several classes of vulnerability, namely: low, moderate, high and very high.The transformation was carried out according to [34] with a few scale adjustments, as shown in Table 2.
The level of coastal vulnerability is then expressed by Coastal Vulnerability Index or CVI, which is determined using the Gornitz method, where each parameter is classified into several classes, namely low, moderate, high, and very high, on a scale of 1 to 4. The CVI value is obtained by calculating the root of the mean multiplication of the class value of all of these parameters [33], according to (1) [27,28,29,30,31].Finally, the CVI category is obtained by transforming the calculated result according to Table 2. f = slope The next stage is to classify the vulnerability level using the K-NN algorithm, after examining that the data contain all levels of CVI categories, with reasonable number of occurrences for each category [26,35,36,37].In this case, the 290 data were used, each of which consists of the 6 coastal parameters and its CVI value.The data were divided into 80% for training and 20% for testing, and using values of K=1 to K=10. Figure 2 shows the steps performed in this study.
In this study, the K-NN uses Euclidean distance to calculate the distance between each parameter of the test data and the corresponding parameter of the training data [38], and determined using the following (2).( The K-NN performance consistency is also tested several times, each time by changing the data sequence, in other words, by shuffling or randomizing the order of the data.Therefore, each round of training-and-testing will use different sets of data points, eventhough maintaining the ratio of 80:20. In this study, the system was designed using UML (Unified Modeling Language), which includes various elements such as diagrams, scenarios, and the GUI Design [40] [41].

RESULTS AND DISCUSSION
From the 290 data along the north coast of Java, the CVI is calculated using the Gornitz formula, which results in the following CVI class distribution: 91 low, 105 moderate, 67 high and 27 very high, as shown in Figure 3.In this study, we used a K-NN algorithm with the values of K ranging from 1 to 10 on 290 data splits into 80% (232) for training and 20% (58) for testing.The attributes used are MSL, MSWH, MTR, Shoreline Change (SC), Landforms, and Slope.
Each data is assigned an identity (ID) from 001 to 290, sequentially from west to east.Data with ID 001 is located in Serang Regency, Banten, while data with ID 290 is in Sidoarjo Regency, East Java.Therefore, for the initial case, the training data consist of those data from Serang to a location in Tuban (001-232), while the testing data are those located from Tuban to Sidoarjo (233-290).Table 3 and Table 4 show the distribution of the test data and training data in original forms (in physical quantities).Meanwhile, Table 5 and Table 6 shows the transformed training data and testing data (in categories).As mentioned earlier, the CVI category is obtained by transforming the result from the Gornitz formula (1), which is calculated using the transformed values of the parameters (Table 2).
For the purposes of this study, a K-NN classifier program with a GUI was written using MATLAB and implemented on prepared data.Figure 4 shows the display after successful data entry and Figure 5 shows the resulting display of K-NN classification with the value K=1.When the RUN button is pressed, the data that have been entered will be used in sequence as the initial ordering.The training data (232 data) used are data from ID001 to ID232, and the test data (58 data) used are data from ID233 to ID290.Meanwhile, to test system reliability and the validity of the classification results, the user can randomize the data that has been input by selecting the RANDOMIZE button and then selecting the RUN button.In this case, 232 training and 58 test data used are no longer in the order they were originally entered.

Accuracy Result of K-NN classification
The Euclidean distance of each test data parameter to the training data is used to calculate K-NN in this study.After the eEuclidean distance is obtained, the distance is sorted in ascending order.The vulnerability index CVI is then calculated using the K value.
Using sequential data, the results of K-NN classification are shown in Table 7.In all cases, 80 percent (232 data) and 20 percent (58 data) were used for training and testing respectively.By using values of K=1 to K=10, an average accuracy of 87.07%, and the highest accuracy of 91.38% were obtained at K=3 and K=4.In this case, the classification results of the K-NN algorithms showed that 5 out of 58 test data do not match Gornitz's vulnerability data (CVI), namely data ID 248 (Lamongan), 266 (Gresik), 271 (Surabaya), 279 (Surabaya), and 281 (Surabaya).Figure 6 shows an illustration of the results of this test.
The consistency of the K-NN performance has also been tested by running the system for K=1 to K=10, by randomizing the data sequence 3 times for each value of K. Table 8 shows the accuracy of the KNN classification with random data.
Based on Table 8, for the test using random data from K=1 to K=10, each of which was carried out 3 times, resulting in an average accuracy rate of 91.84%.The highest average accuracy rate of 97.13% is obtained at the value of K = 2.This shows that the K-NN system performs consistency for different sets of test data.Even though CVI value can be determined directly using the Gornitz method, this study demonstrates that machine learning algorithms such as K-NN can be used to classify the coastal vulnerability levels.

CONCLUSIONS
The K-NN method has been applied in the vulnerability level classification in the North Coast of Java, using 290 data, dividing 80% and 20% of the data for training and testing respectively.From the 58-testing data (cases), this system is able to predict the low CVI at 17 locations, moderate at 19 locations, high at 15 locations and very high at 7 locations.These results agree with the Gornitz vulnerability index (CVI), except for one location in Lamongan, one in Gresik and 3 in Surabaya.The average accuracy of classification test using K-NN with sequential data was 87.07%, with the highest accuracy of 91.38% for K=3 and K=4.In testing with random data, the average accuracy of the K-NN classification test is 91.84%, with the highest accuracy of 97.13% at K=2.To improve the accuracy, further studies may consider reducing the study area, such as applying the same parameters to a coastal cell about 100-200 km long.

Figure 1 .
Figure 1.The distribution of coastal parameter data along north coast of Java sea level (MSL) d = shoreline change (erosion/accretion rate) b = mean wave height (MSWH) e = landform c = mean tide range (MTR)

( 2 )
where xi and yi represents the values of training and testing data respectively, i represents the data index, and d(x,y) represents the distance between training and test data, and represents the number of data.Meanwhile, the level of accuracy of the classification results is calculated using the following (3)[39].

Figure 2 .
Figure 2. Steps of research

Figure 3 .
Figure 3.The distribution of CVI data for the K-NN Classification System

Figure 4 .
Figure 4. Input Data on The GUI Program

Figure 6 .
Figure 6.The comparison of the CVI from K-NN vs CVI Gornitz

Table 3 .
Training Data with Parameter (Attribute) Values

Table 4 .
Test Data with Parameter (Attribute) Values

Table 5 .
Classification of Training Data, With CVI Values According to Gornitz

Table 6 .
Classification of Test Data, With CVI Values According to Gornitz

Table 7 .
The accuracy of K-NN testing using the initial data sequence

Table 8 .
The accuracy of the K-NN testing with random data sequences, for values K=1 to 10