Patent No. | 10,511,659 |
---|---|

Issue Date | December 17, 2019 |

Title | Global Benchmarking And Statistical Analysis At Scale |

Inventorship | Patricia Gomes Soares Florissi, Briarcliff Manor, NY (US) Ido Singer, Nes Ziona (IL) Ofri Masad, Beer-Sheva (IL) |

Assignee | EMC IP Holding Company LLC, Hopkinton, MA (US) |

1. A method comprising:receiving results of intermediate statistical computations performed on respective ones of a plurality of datasets in respective ones of a plurality of distributed processing nodes configured to communicate over at least one network;

performing at least one global statistical computation based at least in part on the results of the intermediate statistical computations; and

utilizing a result of the global statistical computation to perform one or more benchmarking operations for specified parameters relating to the plurality of datasets;

wherein the distributed processing nodes are associated with respective distinct data zones in which the respective datasets are locally accessible to the respective distributed processing nodes;

wherein the global statistical computation comprises at least one of:

computing a global standard deviation of values for a specified parameter based at least in part on sums of differences of the values for the specified parameter determined for respective ones of the datasets as part of respective ones of the intermediate statistical computations and wherein the intermediate statistical computations determine the sums of differences relative to a global average of values for the specified parameter as determined in another global statistical computation performed in a previous iteration; and

computing a global histogram of values for a specified parameter based at least in part on histogram pair lists of the values for the specified parameter determined for respective ones of the datasets as part of respective ones of the intermediate statistical computations wherein a given one of the histogram pair lists comprises a list of histogram slices with corresponding numbers of items in those histogram slices and wherein the intermediate statistical computations determine the histogram pair lists based at least in part on inputs including a minimum value, a maximum value and a number of histogram slices to be included in the corresponding histogram; and

wherein the method is performed by at least one processing device comprising a processor coupled to a memory.

performing at least one global statistical computation based at least in part on the results of the intermediate statistical computations; and

utilizing a result of the global statistical computation to perform one or more benchmarking operations for specified parameters relating to the plurality of datasets;

wherein the distributed processing nodes are associated with respective distinct data zones in which the respective datasets are locally accessible to the respective distributed processing nodes;

wherein the global statistical computation comprises at least one of:

computing a global standard deviation of values for a specified parameter based at least in part on sums of differences of the values for the specified parameter determined for respective ones of the datasets as part of respective ones of the intermediate statistical computations and wherein the intermediate statistical computations determine the sums of differences relative to a global average of values for the specified parameter as determined in another global statistical computation performed in a previous iteration; and

computing a global histogram of values for a specified parameter based at least in part on histogram pair lists of the values for the specified parameter determined for respective ones of the datasets as part of respective ones of the intermediate statistical computations wherein a given one of the histogram pair lists comprises a list of histogram slices with corresponding numbers of items in those histogram slices and wherein the intermediate statistical computations determine the histogram pair lists based at least in part on inputs including a minimum value, a maximum value and a number of histogram slices to be included in the corresponding histogram; and

wherein the method is performed by at least one processing device comprising a processor coupled to a memory.