Introduction

Building a new energy power system is one of the important ways to achieve the goal of carbon peaking and carbon neutrality1. In the process of power system transformation, new energy power represented by water conservancy and hydropower is incorporated into the power grid system in a high-speed and large-scale manner2. Under the new operating environment, grid-connection hydropower stations must undertake comprehensive utilization tasks such as power generation, shipping, and flood control3. Hydropower is a renewable energy source and the process of water circulation is continuous. The introduction of hydroelectricity in water distribution networks allows for the sustainable use of energy by utilizing the kinetic energy of water flow to generate electricity. This helps to promote the goal of sustainable development and reduce dependence on finite energy resources4. This is an important support for the construction of a new power system under the dual carbon goals and a reliable guarantee for the efficient use of water resources. At the same time, hydropower stations are facing both higher intensity and frequency of grid peaking and frequency regulation tasks. The requirements from both water conservancy and electric power generation also greatly limit the space for the optimization of scheduling operations of hydropower stations, easily causing problems such as unbalanced scheduling, inadequate benefits and poor ship navigation. How to better coordinate power scheduling and navigation quality, further improve the joint optimal scheduling benefits of hydropower stations and reduce the waiting time of ships passing through the gates are the key to ensure the effectiveness of the hydropower stations.

Optimal scheduling of hydropower stations with dual generation-navigation objectives is one of the classical problems in the direction of optimal scheduling of engineering systems. To solve this kind of problem, it is necessary to formulate an optimal scheduling problem into a multi-dimensional, nonlinear, multi-stage, strictly constrained optimization problem5,6. Under the condition of satisfying the constraints and the load demand of the power grid, guided by the methods and theories in the field of operation research, the scheduling strategy is used to obtain the scheduling rules with better shipping benefits.

The essence of optimal scheduling of hydropower stations for power generation and shipping is to build and solve mathematical models, which often consist of two parts, scheduling objectives and constraints. Mathematical models are generally divided into the following categories: (1) “water to determine electricity” with maximum generation capacity and maximum generation efficiency, (2) “electricity to determine water” with minimum water consumption and maximum end-of-period energy storage, (3) downstream ecological protection with minimal ecological impact as the objective function. The constraints are also very complex and diverse. The solutions are usually divided into two categories: mathematical programming and intelligent optimization algorithms. Mathematical programming mainly includes Linear Programming7, Nonlinear Programming8, Mixed Integer Programming9, and Dynamic Programming10. With the development of operation research, economics and new theories and the improvement of computational difficulty and computational accuracy of hydropower station scheduling, bionic intelligent optimization models based on Genetic Algorithm11, Particle Swarm Algorithm12, Ant Clony Optimization13 emerged.

Since the difference between the actual arrival time of ships and the declared arrival time might be large, it often increases the waiting time for the ship to pass the gate, making the scheduling effectiveness of the hydropower station low. Meanwhile, the power generation capacity of hydropower stations are not stable due to the restrictions on discharge flow. Currently, most of the existing studies focus on the selection and improvement of optimization strategies and optimization algorithms, without much research on the impact of real-time ship passage through the gates in hydropower stations on the optimization of hydropower stations scheduling. The role of energy storage devices in stabilizing power generation is also not considered in most of the existing studies.

Therefore, our study is to improve the operational effectiveness of the grid connected hydropower stations and to balance the benefits of shipping and power generation effectively. In the actual scheduling process we need to consider the difference between the actual arrival time of ships and the declared arrival time, as well as the shortage or surplus of power generation in hydropower stations. This paper proposes a new multi-objective real-time scheduling model to solve the joint scheduling problem of hydropower generation and shipping by using prediction algorithm, energy storage and intelligent optimization. We also apply the method in a real application of the Wujiang River as a case study. The main contribution of this paper is summarized as follows.

  1. (i)

    Energy storage is introduced in the scheduling process of hydropower stations in order to stabilize the power generation. If the power generation during the scheduling time period is higher than the corresponding grid load demand, the energy storage device is charged, and conversely, if the power generation is insufficient, the storage device is discharged to ensure that the grid power load demand is satisfied.

  2. (ii)

    The power generation during the scheduling time period is analyzed where a new objective function considering the energy storage is proposed. For the function, we consider the factors that might affect shipping and power generation effectiveness such as water level, flow rate and other constraints.

  3. (iii)

    An improved multi-objective beluga optimization algorithm, namely NSBWO, is proposed to encode and schedule the solution for the discharge flow during the scheduling time of the hydropower stations.

The structured rest of the paper is as follows: “Related work”  discusses existing studies regarding the scheduling of hydropower stations. “Ship estimated arrival time prediction”  presents the XGBoost model to predict the expected arrival time of ships. “The proposed real-time scheduling model”  analyzes the constraints in the scheduling process and introduces the improved NSBWO scheduling model. “Experimental discussion”  evaluates our model with multiple metrics, providing the insights of the proposed method. Finally, “Conclusion”   concludes the paper with future work.

Related Work

In recent years, quite a few studies have been proposed on the real-time scheduling of hydropower stations. For example, Yang et al.14 explored the particle swarm optimization algorithm for hierarchical multi-objective optimization problems and its application to the optimal operation of hydropower stations, and optimized the scheduling with the two objectives of maximizing the peak energy efficiency and maximizing the power generation of hydropower stations. Hidalgo et al.15 combined the evolutionary and the gradient algorithms to propose a model for optimizing the short-term operation of hydropower stations. Jia et al.16 proposed a scheduling model considering both daily maximum generation and navigation demand, and obtained the optimal solution by genetic algorithm. Meng et al.17 proposed an improved multi-objective labeled cuckoo search algorithm based on a constrained transformation population initialization strategy to solve the trade-off between water and energy in the context of the Xiaolangdi and Xi’an terrace hydropower stations in the lower Yellow River in China. Fang et al.18 proposed a scheduling model for reservoir ecological protection and power generation maximization in the context of the Minjiang River Shuikou Hydropower Stations in China, and solved it using an improved multi-objective particle swarm optimization algorithm. Marcelino et al.19 proposed an efficient mathematical model for hydropower stations scheduling that is solved using a coral optimization algorithm with different search operators in a single population. Feng et al.20 proposed a new multi-objective particle swarm optimization algorithm with the goal of maximizing power generation. The method introduces recursive mapping, which makes the initial population evenly distributed in the problem space, and the inertia weight and learning coefficient change dynamically with the back generation. Marcelino et al.21 used a multi-objective evolutionary group hybrid algorithm to solve the short-term hydropower unit configuration problem and compared it with other optimization methods. Chen et al.22 developed a coordination model of hydropower and ecological flow using three models NSGA-II, NSGA-III and RVEA. After comparison, the NSGA-III model was found to have certain advantages. Wang et al.23 took the minimum load peak-to-valley difference of multiple power grids as the objective function. Transform the original nonlinear non-convex model into a standard mixed-integer linear programming (MILP) formulation through several linearization strategies.

Most of the existing studies aimed at maximizing generation capacity and efficiency through static scheduling during the scheduling time period. Few considers the hydropower stations that have both shipping and power generation demands, and the application of energy storage combined with hydropower generation in stabilizing grid peaking.

Ship estimated arrival time prediction

Research purpose

In order to realize the combination mode of “weekly sluice plan declaration + daily sluice plan control + sluice real-time scheduling” of the Wujiang Silin Hydropower Station in Guizhou province of China. The scheduling center needs to use technical means such as the shipboard Beidou system24 and video surveillance to obtain the real-time position, speed and other characteristics of the ship. And real-time tracking, analysis and calculation of the ship’s estimated arrival time at the port.

Figure 1
figure 1

The flow chart for estimated time of the arrival solution.

Full size image

Predictive model

To obtain datasets of ship navigation on the Wujiang channel, we use the Beidou system24 to track ships and the planning data of ship passage. We collect the data every six minutes with timestamps. The characteristics of the data include ship identification number (i.e., mmsi), current latitude and longitude coordinates (i.e., lat, lon), speed (i.e., speed), ship type (i.e., (ship_type)), origin, destination, destination latitude and longitude coordinates (i.e., (end_lat), (end_lat)), the actual departure time, the actual arrival time, and whether reached the destination.

To make the predicted time more accurate, the coordinates under each timestamp are used to calculate the distance traveled by a ship. In order to facilitate the establishment and prediction of the model, we convert the timestamp in the data to date type. And subtract the departure time from the timestamp to get the time that the ship has traveled and convert the duration to seconds. In the process of dividing the data set, we randomly divided the training set and the test set, and the size (test_size) of the test set was selected as 0.08. Finally, train on historical data and make predictions by building a predictive model. The specific solution flow chart is shown in Fig. 1.

The timestamp interval of adjacent data in the dataset that we collected is short, which can accurately grasp the various characteristics of the ship during its travel. During the solution process, the estimated ship arrival time is transformed into a regression prediction problem by utilizing the time stamp. Extreme Gradient Boosting (XGBoost)25 works efficient on small to medium datasets and can adaptively learn the weights of each decision tree. It has good generalization performance and is widely used in the field of data science. It can effectively solve the problem of predicting the expected arrival time of ships. So we choose the XGBoost model for prediction. XGBoost is an extreme boosting tree model based on the Boosting integrated learning algorithm, and an integrated gradient boosting algorithm based on a decision tree. Composed of many classification and regression trees (“weak learners”), the data set used by each regression tree is the entire data set, and the generation of each tree can be regarded as a single complete regression tree generation process. There is a sequence between trees. The results of each previous regression tree affect the prediction result of the next regression tree, that is, the latter regression tree is affected by the deviation of the previous regression tree during the prediction process. In each iteration, a new weak learners adjusts the model for the next iteration based on the difference between the previous model’s predictions and the true values. The core idea of Boosting is to sum the results of all weak classifiers to the predicted value, and then the next weak classifier fits the error function to the gradient of the predicted value (the error between the predicted value and the true value), thus continuously reducing the residuals until the error requirement of the system is satisfied. Figure 2 illustrates the process.

Figure 2
figure 2

XGBoost regression prediction model.

Full size image

The proposed real-time scheduling model

Mathematical model

Analyze shipping efficiency

The shipping safety of the downstream navigation channel of the Silin Hydropower Station is mainly relevant to the water level change, flow rate, water level change rate, and downstream flow height. Among them, downstream water level and daily water level variation, and water flow rate have great correlation with discharge flow. The increase or decrease of the downstream flow affects the downstream water level and flow rate, while water level and downstream flow change rates are also related to the discharge flow. Therefore, the main factor affecting downstream channel safety is the discharge flow. Table 1 presents the specific factors that impact navigation safety.

Table 1 Factors affecting the efficiency of shipping.
Full size table

Analyzing the historical data of the Silin Hydropower Station, the navigation assurance rate is more than 0.9 when the discharged flow rate of the Silin Hydropower Station is 0-200 m3/s; and more than 0.8 when it is 200–400 m(^3)/s. With the increase of the discharge flow, the navigation assurance rate decreases. The relationship between the specific Silin Hydropower Station discharge flow and the navigation assurance rate is shown in Fig. 3.

Figure 3
figure 3

Relationship between discharge flow and navigation assurance rate.

Full size image

Constraints

The constraints of the Silin hydropower station scheduling system are illustrated in the section.

Water level restriction in the operation of hydropower stations: in order to ensure the stability of hydropower station operations and the safety of downstream areas, we must impose restrictions on water level heights in the runtime of the stations. We thus have

$$begin{aligned} Z_{min }^j le {Z^j} le Z_{max }^j, end{aligned}$$
(1)

where (Z^{j}) is the operating water level of the power stations at time j; (Z_{max}^{j}) and (Z_{min}^{j}) are the upper and lower limits of the operating water level of the power station at time j, respectively. The water level trend of the Silin Hydropower Station is shown in Fig. 4, in which the reddish brown line indicates the upper limit of water level operation of the Silin Hydropower Station and the blue line indicates the lower limit of water level operation of the Silin Hydropower Station.

Figure 4
figure 4

The trend of tailwater level and upstream water level.

Full size image

Hydropower station discharge flow constraints: discharge flow and downstream navigation benefits and downstream ecological stability are closely related to the need to meet certain conditions.

$$begin{aligned} Q_{min }^j le Q_{}^j le Q_{max }^j, end{aligned}$$
(2)

where (Q^j) is the discharge flow of the power station at time j; (Q_{max }^j) and (Q_{min }^j) are the maximum and minimum discharge flow of the power station to meet the shipping requirements at time j. The flow trend of the Silin Hydropower Station from October 1 to 7, 2021 is shown in Fig. 5.

Figure 5
figure 5

The flow trend of the Silin Hydropower Station.

Full size image

Capacity constraints of hydropower stations: In order to protect hydropower units and maintain the stability of grid peaking, the capacity needs to be within a certain range.

$$begin{aligned} P_{min }^j le P_{}^j le P_{max }^j, end{aligned}$$
(3)

where (P_{max }^j) and (P_{min }^j) are the maximum and minimum allowable output of a station.

Downstream water level variation constraint: the downstream water level must not only meet the maximum and minimum water levels but also the value of water level variation per unit by time. We have

$$begin{aligned}{} & {} Delta {Z_{{rm d}}} le {Z_{rm{{d_max}}}}, end{aligned}$$
(4)
$$begin{aligned}{} & {} Delta {Z_h} le Delta {Z_{h_max }}, end{aligned}$$
(5)
$$begin{aligned}{} & {} {Z_{15min }} le {Z_{15min _max }}. end{aligned}$$
(6)

In the equations, (Z_{rm {d}}), (Z_h), and (Z_{15min }) are daily, hourly, and 15-minute variations of downstream water levels, respectively. (Z_{rm{{d_max}}}) , (Z_{h_max }) , (Z_{15min _max }) are the corresponding maximum values.

Ship passing restriction: When a ship passes during the scheduling period, the load is required not to change, so the discharge flow during this scheduling period should be consistent with the discharge flow at the previous scheduling time. At the same time, the water storage capacity plus power generation at this time should be greater than or equal to the grid load demand. Given (R_{rm{{Storage Capacity}}}^t) as the storage capacity at time t, (F_{rm{{Hydroelectricity}}}^t) as the power generation of the hydropower station at time t, and (F_{rm{{Electricity demand}}}^t) as the grid load demand at time t, we thus have

$$begin{aligned} R_{rm{{Storage Capacity}}}^t + F_{rm{{Hydroelectricity}}}^t ge F_{rm{{Electricity demand}}}^t. end{aligned}$$
(7)

Non-negative conditional constraint: all the above variables must be non-negative.

Objective function

In the study , we consider to store the power generated by hydropower stations to the grid, and to discharge the storage system whenever the power generated by hydropower stations cannot meet the demands of the grid. In order to maximize the benefits of the hydropower station, our objective is to generate power to meet the grid demands through the downstream flow. We also aim to reduce the negative impact on the ship navigation as the downstream flow might stop the shipping. Therefore, in this paper, we have two objectives, namely the power generation and the shipping scheduling.

Power generation scheduling goal: we aim to maximize the total stored energy at the end of the scheduling time period, and the total stored energy is accumulated from the stored energy within each time period. The amount of energy stored in each time period is the sum of the amount of power generated in the time period plus the amount of energy stored, and minus the load demand on the grid in the period. The power generation capacity of a hydropower station is related to the discharge flow, which is expressed by the “water to determine electricity” method, shown as follows.

The stored energy in the tth time period is:

$$begin{aligned} {mathrm{{F}}_t} = left. {left{ {9.81{P^t}Delta t + {R^{t – 1}} – F_{rm{{Electricity demand}}}^t} right. } right} , end{aligned}$$
(8)
$$begin{aligned} R_{}^{t – 1} = 9.81sum limits _{j = 1}^{t – 1} {(P_{}^j} Delta t – F_{rm{{Electricity demand}}}^j), end{aligned}$$
(9)

where (P^t) represents the output of the hydropower station at time t, (R^{t-1}) represents the stored energy at (t-1), and (Delta t) is the calculated time interval. (F_{rm{{Electricity demand}}}^t) represents the grid load demand at time t. The outgoing force in the tth time period is:

$$begin{aligned} {P^t} = {H^t}{Q^t}, end{aligned}$$
(10)

where (H^t) and (Q^t) are the power output, head and generation quoted flow of the power station at time t, respectively. We set the initial stored energy to zero and the generation scheduling objective function F1:

$$begin{aligned} Mmathrm{{ax }}{mathrm{{F}}_1} = left. {left{ {9.81sum limits _{j = 1}^T {H_{}^jQ_{}^j} Delta t – F_{rm{{Electricity demand}}}^mathrm{{j}}} right. } right} . end{aligned}$$
(11)

Shipping scheduling goal: The mapping relationship between the discharge flow of a hydrapower station and the navigation assurance rate is known, and the navigation assurance rate is considered as the evaluation index of the shipping efficiency of a channel, and the larger its value represents the better the shipping efficiency. The shipping scheduling function is as follows:

$$begin{aligned} Mmathrm{{ax }}{F_2} = frac{1}{T}sum limits _{j = 1}^T {{k_j}} left( {{Q^j}} right) , end{aligned}$$
(12)

where, F2 is the downstream channel navigation scheduling objective function; (k_j) is the guaranteed rate of navigation in the j-th time period; (Q^j) is the discharge flow of the hydropower station in the jth time period.

NSBWO optimization algorithm

NSBWO is an algorithm that combines the BWO26 algorithm and the NSGA-II27 algorithm and is oriented to multi-objective optimal scheduling. When evaluating the population, fast non-dominated sorting and crowding degree calculation are used, BWO algorithm is used when generating the offspring population, and finally the elite strategy is introduced to screen the offspring to retain excellent individuals.

There are three main differences between the NSGA-II algorithm and the traditional genetic algorithm. The first point is to perform fast non-dominated sorting when selecting individuals, which reduces the time complexity of algorithm operation and improves computational efficiency. The second point is to adopt elitism. After the generation of offspring is merged with the parent generation to obtain a new population, a fast non-dominated sort is performed, which increases the probability of retaining an excellent population. The third point is that the method of calculating the crowding distance is used as a criterion for selecting the best among individuals of the same class, which ensures the diversity of the population.

The BWO algorithm simulates three behaviors of beluga whales: exploration, exploitation, and falling. The transition between the three behaviors depends on the balance factor (B_{rm {f}}) , which is defined as:

$$begin{aligned} {B_{rm {f}}} = {B_0}(1 – T(2{T_{max }})), end{aligned}$$
(13)

where T is the current number of iterations, (T_{max }) is the maximum number of iterations, and B0 varies randomly between (0,1) in each iteration when (B_{rm {f}} > 0.5) for the exploration phase and when (B_{rm {f}}< 0.5) for the exploitation phase. As T increases, the range of (B_{rm {f}}) decreases from (0,1) to (0,0.5), and the probability of the exploitation phase increases.

The location of the beluga whales during the exploration phase is updated as follows:

$$begin{aligned}{} & {} X_{i,j}^{T + 1} = X_{i,{p_j}}^T + (X_{r,{p_1}}^T – X_{i,{p_j}}^T)(1 + {r_1})sin (2pi {r_2}),j = even. end{aligned}$$
(14)
$$begin{aligned}{} & {} X_{i,j}^{T + 1} = X_{i,{p_j}}^T + (X_{r,{p_1}}^T – X_{i,{p_j}}^T)(1 + {r_1})cos (2pi {r_2}),j = mathrm{{odd}}. end{aligned}$$
(15)

(X_{i,j}^{T + 1}) is the new position of the ith beluga in the jth dimension, ({P_{text{j}}}(j = 1,2,ldots ,d)) is a random integer chosen from the d-dimension, (X_{i,{p_j}}^T) is the position of the ith beluga in the (p_j) dimension, (X_{r,{p_1}}^T) is the current positions of the 1st and r-th beluga, respectively (r is a random number), and the range of random operators used by (r_1) and (r_2) for the augmented exploration phase is (0, 1).

The exploitation phase location updates are as follows:

$$begin{aligned} X_i^{T + 1} = {r_3}X_{best}^T – {r_4}X_i^T + {C_1}{L_F}(X_r^T – X_i^T), end{aligned}$$
(16)

where (X_i^T) and (X_r^T) are the current position of the ith beluga and the r-th beluga, respectively. (X_i^{T + 1}) is the new position of the i-th beluga, (X_{best}^T) is the best position, (r_3) and (r_4) are random numbers between (0,1), and (C_1) is the random jump intensity measuring the strength of the Levy flight. (L_F) is the Levy flight function.

The falling stage locations are updated as follows:

$$begin{aligned} X_i^{T + 1} = {r_5}X_i^T – {r_6}X_r^T + {r_7}X_{step}^{}, end{aligned}$$
(17)

where (r_5), (r_6) and (r_7) are random numbers between (0,1) and (X_{step}) is the step size defined as:

$$begin{aligned} X_{step} = ({u_b} – {l_b})exp ( – {C_2}T/{T_{max }}), end{aligned}$$
(18)

where (C_2) is the step factor associated with whale fall probability and population size, and (u_b) and (l_b) are the upper and lower bounds of the variables, respectively. The whale fall probability (W_f) is calculated as a linear function:

$$begin{aligned} W_f^{} = 0.1 – 0.05T/T_{max }^{}. end{aligned}$$
(19)

BWO is a population-based optimization algorithm, the inspired by the habits of whales in nature. It is updated according to its own position, food, and the positions of other belugas. Second, BWO can jump out of the local optimum by whale falling and introduces the Levy flight mechanism to improve convergence. The exploration capability of BWO is achieved by continuously expanding search agents throughout the search space, and the search trajectories are clustered around the global optimum achieving fast convergence. At the same time, the search history of BWO exhibits an approximately linear search pattern to avoid falling into local optima and ensure its global convergence. The results of BWO are closer to the global optimal solution than those of Genetic Algorithm (GA)28, Particle Swarm Optimization (PSO)29, Gray Wolf Algorithm (GWO)30, Gravitational Search Algorithm (GSA)31 in practical constrained optimization problems. Therefore, we introduce NSBWO based on BWO to optimize the scheduling of shipping and power generation effectiveness in solving the optimization problem of hydropower stations in this paper. The populations are screened using three methods in NSGA-II: fast non-dominated sorting, elite strategy, and crowding degree. Three stages of BWO exploration, exploitation, and falling are used to generate offspring subpopulations. Compared with NSGA-II, the global convergence is guaranteed and the number and quality of feasible solutions obtained by the algorithm are improved. The specific NSBWO algorithm flow is shown in Fig. 6.

Figure 6
figure 6

The flow chat of the NSBWO algorithm.

Full size image

The model for hydropower stations

The whole scheduling model contains five parts: power load, expected arrival time of ships, constraint correction, energy storage strategy, and scheduling optimization. The five components complement each other to form a real-time multi-objective joint scheduling model. The implementation of the model is divided into eight steps as follows:

Step 1: Request grid load demand data from the grid system for the scheduling period and use a forecasting algorithm to predict the expected arrival times of passing ships.

Step 2: Code the downstream flow during the scheduling period with the number of coded bits equal to the number of scheduling times.

Step 3: Analyze the constraints that exist in the actual scheduling process.

Step 4: Determine if there is a ship passage at that moment, if not then scheduling the power generation normally, if there is a downstream flow need to be consistent with the previous period, if the power generation is not enough energy storage mechanism is needed to discharge.

Step 5: Initialize the Populations targets and perform non-dominated stratified ranking.

Step 6: Generate new offspring using the BWO algorithm.

Step 7: Perform fast non-dominated sorting and crowding distance calculation operations.

Step 8: Determine whether the number of iterations meets the condition, and if not, cycle the execution until the condition is met. The scheduling scheme with the optimal value of the discharge flow for each time period of the hydropower station is obtained.

The specific flow chart of the scheduling model is shown in Fig. 7.

Figure 7
figure 7

The multi-objective real-time scheduling model for hydropower stations.

Full size image

Experimental discussion

Experiment introduction

The hardware and software environments used for the experiments are shown in Table 2. The hardware environment includes the processor, memory, and hard disk, and the software environment includes the operating system, programming environment, and development language.

Table 2 Experimental settings.
Full size table
Table 3 Navigation data samples (part I).
Full size table
Table 4 Navigation data samples (part II).
Full size table

Experiment purpose

Prediction of arrival time using the XGBoost model to obtain the actual arrival time of navigable ships. The XGBoost model is implemented to predict the arrival time and get the actual arrival time of navigable ships. Linear regression32, Ridge regression33 and CNN34 algorithms are also used for comparison. We implement the NSBWO algorithm for scheduling the discharge flow for each time period at the Silin Hydropower Station and analyze the feasibility of the method. And also use NSGA-II, GA-NSGA-II, reference vector guided evolutionary algorithm(RVEA)35 NSGA-III36, Multiobjective Evolutionary Algorithm Based on Decomposition(MOEA/D)37 algorithms for experimental comparison with the proposed algorithm.

Experimental process

XGBoost test data processing

Table 3(times) Table 4 shows a sample of the navigation data of a cargo ship on the Wujiang Channel on October 1, 2021, which are seperated in two tables for the simplicity of reading. The data in the tables includes the attributes of ship type, latitude and longitude coordinates, speed, travel distance, and current travel time. Among the attributes, there exist three types of coordinates, namely the start (Start(_)lat, Start(_)lon), the desitination (End(_)lat, End(_)lon), and the current location (Lat, Lon). For the Ship(_)type, we utilize the hot encoding method for discrete data types by setting the type of normal cargo ship to 1. The Duration and Distance are calculated based on the timestamp and coordinates.

XGBoost model experiment setup

In the process of predicting the expected arrival time of a ship, we select nine features, namely the latitude and longitude of the initial and final positions, the real-time latitude and longitude of the ship, the type of the ship, the speed of the ship, and the distance traveled by the ship, as input features, and the travel time of the ship as target features. The data is split into 92% training set and 8% test set. Use Eq. (20) to map the dataset to [0, 1] for normalization.

$$begin{aligned} {X^*} = frac{{X – {X_{Min}}}}{{{X_{Max}} – {X_{Min}}}}. end{aligned}$$
(20)

Grid search is used during parameter tuning of XGBoost models. Perform a grid search on the maximum depth (max_depth) of the tree, the learning rate (learning_rate), and the maximum number of iterations (n_estimators). (max_depth) is set to [7, 9, 11, 13], (learning_rate) is set to [0.01, 0.05, 0.1], and (n_estimators) is set to [500, 1000]. Each parameter combination is traversed once, and the best parameter combination is finally obtained after comparison. The specific optimization parameters are listed in Table 5.

Table 5 Model parameter settings.
Full size table

Scheduling experiment data

The time interval between two adjacent rows of discharge flow data is one minute in the data set we obtained related to the Silin Hydropower Station. However, in the process of actual scheduling of hydropower stations, which is influenced by the start and stop of units, one minute scheduling is not in line with the actual scheduling situation. Therefore, we choose a scheduling interval of half an hour and smooth the data. The average value of the discharge flow is coded every half hour as the discharge flow in the scheduling period. The data related to the scheduling of the hydropower station are the lower limit of downstream flow (i.e., (Flow_down)), the upper limit of downstream flow (i.e., (Flow_up)), the grid load demand (i.e., Load), the ship passage (1 if there is a value otherwise 0), the guaranteed rate of navigation k, and the head H. The specific data is shown in Table 6.

Table 6 Scheduling data samples.
Full size table

We selected the data between 8:00 and 18:00 on October 1, 2021 for the experiment. The estimated time of arrival of the ship obtained by using the XGBoost algorithm is shown in Table 7.

Table 7 Estimated arrival time samples.
Full size table

NSBWO model experiment setup

First, we set the parameters of the model. Since the objective function includes the maximum generation and the highest transportation effectiveness. So the number of objective functions is 2, the population size is set to 100, the maximum number of iterations is 200, and the crossover probability is 0.01, summarized in Table  8.

Table 8 Experimental variables.
Full size table

The total time we schedule the discharge flow of the Silin Hydropower Station is from 8:00 to 18:00, and the scheduling time interval is that 30 min. We first encode the discharge flow, and the discharge flow in each scheduling time period corresponds to a code respectively. According to the time order, we can get the code segments of length 20, which are positioned as C1, C2,…, C20. We also set the upper and lower limits for each time period, as shown in Fig. 8.

Figure 8
figure 8

Decision variable coding.

Full size image

The initial population is then initialized according to the constraints and flow restrictions for each time period to obtain the initial population and perform non-dominated ranking. Using BWO to generate offspring, the offspring population is merged with the parent population to obtain a new population. A new population is formed again by selecting the 100 best individuals in a non-dominated ranking of the new population. Finally, several iterations are performed until the maximum number of iterations is reached to obtain a solution for the discharge flow that takes into account both power generation and shipping.

Result analysis

XGBoost experimental result analysis

We use the same dataset and divide it consistent with the XGBoost model. The comparison experiments were conducted using CNN, Ridge regression, and Linear regression models. The prediction results of the four models are shown in Figs. 9, 10, 11 and 12. Where the red curve represents the real data and the blue curve represents the predicted data. From the images, it can be concluded that the degree of XGboost fitting is close to Linear regression and slightly higher than CNN and Ridge regression.

Figure 9
figure 9

CNN predicted results.

Full size image
Figure 10
figure 10

Ridge regression predicted results.

Full size image
Figure 11
figure 11

Linear regression predicted results.

Full size image
Figure 12
figure 12

XGBoost predicted results.

Full size image

After training the model using historical data, the test data is put into the trained model and the predicted values are calculated. The predicted values of the test data and their matched true values are used for model evaluation. In this paper, the evaluation metrics are evaluated by five regression models as shown in Table 9, where n denotes the number of samples, (y_{i}) denotes the true value of the ith sample, (widehat{y_{i} }) denotes the predicted value of the ith sample, and (overline{y}) denotes the average of the true values.

Table 9 Evaluation metrics.
Full size table

From the experiments, it is concluded that the XGBoost model is optimal in MSE38, MAE39, RMSE40, MAPE41, and R242 metrics, with the R2 fitting metric reaching 0.983. followed by the Linear regression model outperforming the CNN and Ridge regression models. The specific metrics are shown in Table 10.

Table 10 Comparison of predictive models.
Full size table

NSBWO experimental result analysis

We visualize the experimental results in Fig. 13. Each dot in the figure represents a population (i.e., feasible solution), the value of the discharge flow that satisfies the condition in each time period. The x axis represents the value of the solution’s objective function F1 and the y axis represents the value of the objective function F2. We aim to obtain solutions corresponding to both objective functions F1 and F2 that are as large as possible, so the closer the dot is to the upper right in Fig. 13 indicates the better quality of that solution set.

Figure 13
figure 13

NSBWO scheduling.

Full size image

Figure 13 shows that the two solution sets with the objective function values of (917.321, 12429.1) and (924.751, 12424.1) have the highest fitness. Table 11 shows the discharge flow obtained from the solution set (924.751, 12424.1). The three periods in the table, 10:00-10:30, 14:00-14:30, and 16:30-17:00, have ships passing through the gate, and the flow rate under the unit that cannot be shut down needs to be consistent with the previous moment.

Table 11 The discharge flow obtained from the NSBWO experiments.
Full size table

Meanwhile, we compare against five algorithms NSGA-II, GA-NSGA-II, RVEA, NSGA-III, and MOEA/D, and the results are in Fig. 14. It can be seen that the results of the solution objective function of NSBWO are the best. And the number of feasible solutions obtained by NSBWO scheduling is significantly larger than the other five optimization algorithms when the number of populations, constraints and maximum number of evolutionary generations are given the same. Since the results of the other methods are very close to each other, we individually show them in Figs. 15, 16, 17, 18 and 19 for better understanding of each method. To be noted, we normalize the real values in the figures for the visulization purpose. For the real values, each of the y values in Fig. 15 shall add 12825.326. Each of the x values in Fig. 16 shall add 316.93792304. Each of the x values in Fig. 18 shall add 316.93792304.

Figure 14
figure 14

Scheduling algorithm comparison.

Full size image
Figure 15
figure 15

NSGA-II scheduling.

Full size image
Figure 16
figure 16

GA-NSGA-II scheduling.

Full size image
Figure 17
figure 17

RVEA scheduling.

Full size image
Figure 18
figure 18

NSGA-III scheduling.

Full size image
Figure 19
figure 19

MOEA/D scheduling.

Full size image

In each scheduling experiment, we choose the values of the objective functions F1 and F2 of the solution closest to the upper right corner of each figure for comparison. The results are in Table 12. We observe that the F2 value of NSBWO is slightly smaller than the average in the table, NSBWO can be considered to be the best as F1 is much larger than the average.

Table 12 Objective function comparison of scheduling models.
Full size table

Next, we evaluate hypervolume (HV), spacing, and runtime metrics for the scheduling algorithms. The HV metric shows the convergence, stepwiseness, and cardinality of a learning set. If one solution set is better than another, its hypervolume index is greater. Spacing measures the standard deviation of the minimum distance of each solution to other solutions. The smaller the spacing value is, the more uniform the solution set is. The HV and spacing indicator formulas are shown in Eqs. (21) and (22).

$$begin{aligned} S{textrm{pacing}}({textrm{P}}) = sqrt{frac{1}{{|P| – 1}}sum limits _{i = 1}^{|P|} {{{({overline{d}} – {d_i})}^2}} }, end{aligned}$$
(21)

where P denotes a set of uniformly distributed reference points sampled on PF, (d_i) denotes the minimum distance between the i-th solution and other solutions in P, and ({overline{d}}) denotes the mean of all (d_i).

$$begin{aligned} HV = delta left( {bigcup nolimits _{i = 1}^{|S|} {{v_i}} } right) , end{aligned}$$
(22)

where (delta) stands for Lebesgue measure and it is used to measure volume, |S| represents the number of non-dominated solution sets, and (v_i) represents the hypervolume formed by the reference point and the ith solution in the solution set.

The findings of the models are shown in Table 13. In terms of running time, MOEA/D has the longest running time, and the running times of other models are smaller and close. On HV, the NSBWO algorithm has the highest value, and the solution set obtained by NSBWO is significantly better than the solution set obtained by the other models. The Spacing of NSBWO is the highest, but because the other models obtain fewer solution sets, it does not mean that the solution set quality of NSBWO is lower than the other models.

Table 13 Evaluation of experimental results of scheduling models.
Full size table

Conclusion and future work

This paper focuses on the research of hydropower stations integrated into the power grid system, considering the functions of navigation and power generation. We propose a scheduling strategy that considers the real-time passage of ships and the use of energy storage to stabilize the power generation of hydropower stations. The strategy is applied to a real case of the Silin Hydropower Station on the Wujiang waterway in China to show the effectiveness of the proposed solution. However, the method can be applied in other places. Experiments have shown that the XGBoost algorithm predict the time when a ship passes through the gate in real time with (R^{2}) to be 0.98, which is competitive compared with the other three popular models, namely CNN, Ridge Regression, and Linear Regression. The prediction method improves the waiting time for ships to pass through the lock and it also improves the power scheduling effectiveness of hydropower stations. When the power generation of a hydropower station is greater than the demand of the grid, the energy storage is ready to store energy. When it is less than the demand of the grid, the energy storage is discharged. This study uses the NSBWO algorithm combined with NSGA-II and BWO to optimize the discharge flow of the Silin Hydropower Station. This results in the daily operation plan of the hydropower station considering the dual factors of shipping and power generation.

For the future work, we would continue to study the multi-objective optimization problem of hydropower stations. We would also consider the real-time impact on the downstream ecological environment in the actual scheduling processes.