US Pat. No. 10,509,809

CONSTRUCTING GROUND TRUTH WHEN CLASSIFYING DATA

Amperity, Inc., Seattle,...

1. A system comprising:a database that stores a plurality of records and a plurality of features for the plurality of records; and
a memory coupled to a processor, the memory comprising a plurality of instructions that cause the processor to:
perform pairwise comparisons on at least a portion of the plurality of records to generate a feature signature for each pairwise comparison, wherein the feature signature indicates common features between the pair of records being compared; and
generate output data, based on the pairwise comparisons, comprising a list of unique feature signatures and comprising corresponding record pairs sampled according to a predetermined sample size for each unique feature signature
wherein the plurality of instructions further cause the processor to obtain user data for determining a precision value or recall value of a classifier, the classifier being configured to classify the feature signatures.

US Pat. No. 10,599,395

DYNAMICALLY MERGING DATABASE TABLES

Amperity, Inc., Seattle,...

1. A system comprising:a database that stores a first database table and a second database table, the first database table comprising a first set of records and a first plurality of fields for the first set of records, the second database table comprising a second set of records and a second plurality of fields for the second set of records; and
a memory coupled to a processor, the memory comprising a plurality of instructions that cause the processor to:
obtain a user-specified threshold confidence level;
compare the first set of records and the second set of records to:
identify, according to the user-specified threshold confidence level, a set of related record pairs, wherein each related record pair includes a record in the first set of records and a record in the second set of records, wherein each related record pair is determined by comparing field values to generate a feature signature for the related record pair and classifying the feature signature to generate a confidence score, and
identify a set of unique records among the first set of records and the second set of records; and
generate a dynamically merged database table comprising a selected portion of the set of related record pairs and comprising the set of unique records.