1. A short-term interval prediction method for photovoltaic power output, comprising steps of:(1) data preprocessing and correlation determination

cleaning collected industrial data firstly, and then analyzing the correlations of influencing factors on photovoltaic power output to reduce the dimension of a sample set and improve the accuracy and computation efficiency of a model; using Person correlation coefficient to measure the degree of correlation between two variables, and the form is shown in formula (1):

wherein xi represents potential influencing factors of temperature and weather on day i, and yi is corresponding photovoltaic power output data; x and y respectively represent the average values of the influencing factors and photovoltaic power output in the data set; the method is used to quantify the correlation between photovoltaic power output and potential factors of temperature, humidity and weather index;

(2) similar day clustering

1) numerical similarity

based on the correlation analysis results of step (1), constructing a sample set as shown in formula (2):[rTPT;rHPH;rWPW]??(2)

wherein T, H and W respectively represent average temperature, average humidity and weather index, and rTP, rHP and rWP respectively represent correlation coefficients between temperature, humidity and weather factors and power; formula (2) is clustered by FCM algorithm according to numerical similarity; the FCM algorithm obtains the membership of each sample point to all class centers by optimizing an objective function, and is a partition-based fuzzy clustering algorithm; the objective function is:

wherein dvt=?zv?st? represents an Euclidean distance between the tth sample point st and the vth clustering center zv in the sample set, e is a weighted index, uvt is the degree to which st belongs to zv, and N is the number of samples; then, a constraint condition is expressed in the form of formula (4):

by introducing a Lagrange multiplier, calculating the membership and the clustering center as shown in formulas (5) and (6):

then, updating the membership and the clustering center through iteration, and judging the convergence of the clustering center according to a given threshold; if the clustering center reaches the number of iterations or converges to the given threshold, stopping the iteration, and obtaining multiple similar day sets and respective clustering centers;

2) pattern similarity

discrete Fre'chet distance is a description mode based on spatial path similarity, is used to evaluate the similarity between two time series, and thus used to correct the clustering result; a formula is shown in (7):

wherein DF(L1, L2) represents the discrete Fre'chet distance between curves L1 and L2, and represent ordered substrings composed of discrete points of L1 and L2, and n and m represent the lengths of L2 and L1 respectively; d(L1,n, L2,m) represents the Euclidean distance between L1,n and L2,m; formula (7) is solved by a recursive method; when two discrete substrings recurse to and , the calculation is terminated; then,DF(,)=d(L1,1,L2,1)??(8)

a similar day correction formula based on the discrete Fre'chet distance is shown in formula (9):D=min(DF(Daypq,c1), . . . ,DF(Daypq,cd))??(9)

wherein DF(Daypq, cn) represents the discrete Fre'chet distance between a sample day q and class d clustering center in class p similar day weather, and D is a minimum value of all discrete Fre'chet distances; when D=DF(Daypq,cd), p?d, this indicates that the pattern similarity between the sample day q and the class d weather clustering center of the class p similar day weather is maximal and is added to the class d similar day weather; all similar days are corrected by the method, and the result after correction is a similar day division result;

(3) construction of sample observation values based on an adaptive scale coefficient interval estimation method

constructing a prediction sample set, and showing an input and an output of the prediction sample set in formula (10) and formula (11):[t?h,Th,H?h,Wh,Th+1,H?h+1,Wh+1]??(10)

[Hh+1,Lh+1]??(11)

wherein t? is time, T is temperature, H? is humidity and W is a weather type index; h and h+1 respectively represent a current time and a prediction time, and Hh+1 and Lh+1 are respectively upper and lower limits of a prediction interval at h+1 time; because the sample set lacks observation values of the upper and lower limits of the photovoltaic power output interval, a variable scale coefficient is constructed to determine the observation values; a specific formula is shown in formula (12):

wherein ? and ? are fixed upper limit scale factor and lower limit scale factor; a+bk? is a penalty function; a and b are constants; k? is a penalty factor, which is expressed as formula (13):

wherein P is an average power value of the observed samples, and Pg is a corresponding power value at g time; ?, ?, a and b values are obtained by NSGA-II multi-objective optimization algorithm; the penalty factor k? dynamically adjusts the scale of the upper and lower limits of the interval according to the power amplitude;

(4) construction of the interval prediction model based on NSGA-II-DLSSVM method

step 1: randomly initializing M populations, and each population comprising a set of parameters: a, ?, a, b, ?1, ?1, ?2 and ?2, wherein ?1 and ?1 are the parameters of a vector machine 1, ?2 and ?2 are the parameters of a vector machine 2, and other parameters are the parameters of the estimation method of the upper and lower limits of the variable scale interval in formula (12);

step 2: substituting initialization parameters into the model, and obtaining the upper and lower limits of interval prediction by training samples based on DLSSVM combined with the estimation method of the upper and lower limits of the variable scale interval;

step 3: calculating the function values of two objectives of interval coverage probability and average width according to prediction results, wherein the interval coverage probability represents the proportion or probability that actual data are distributed in the prediction interval, and the size of the probability is one of important indexes to determine the accuracy of interval prediction, and a formula is shown as follows:

wherein PIC represents the interval coverage probability; K represents the number of the samples; the value of ar is 0 or 1; if a target value yr of the rth sample is in the prediction interval, ar is 1; otherwise, ar is 0; and a definition is shown in formula (15):

the average width of the interval is used as another interval prediction index to improve interval prediction quality; the expression of the average width of the interval is:

wherein WI represents the average width of the interval; Hl and Ll respectively represent the upper limit and the lower limit of interval prediction at l time; and R is a range of the upper limit and the lower limit of the prediction interval, and is used to normalize WI;

step 4: sorting an objective function solution corresponding to each individual in the populations based on improved fast non-dominated sorting to reduce sorting complexity and shorten sorting time;

step 5: calculating and sorting congestion for the individuals in the same layer after non-dominated sorting, and reserving good individuals in a parent generation by an elitist retention strategy;

step 6: copying, crossing and mutating parent and son and merging, and updating the population parameters;

step 7: repeating steps 2-6 until the number of iterations reaches a set number or the model performance improvement is less than the given threshold.