Information Extraction from Unstructured Text

NER inference and Clustering on thousands of documents quickly

NLP (Natural Language Processing)

is a subfield of artificial intelligence (AI) that focuses on the interactions between humans and computers using natural language. It involves the use of computational techniques to analyse, understand, and generate human language. NLP techniques are used to process and analyse large amounts of natural language data, including text, speech, and even sign language. Some of the tasks that can be performed using NLP techniques include language translation, sentiment analysis, speech recognition, text-to-speech conversion, and text summarization. NLP is used in a wide range of applications, such as virtual assistants, chatbots, machine translation systems, and search engines.

NER (Named Entity Recognition)

is a subtask of natural language processing (NLP) that involves identifying and classifying named entities in text into predefined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. NER is used to extract structured information from unstructured text, which can be used for various downstream natural language processing tasks such as information retrieval, question answering, and machine translation.

Inference

is the process of using a trained model to make predictions on new, unseen data.

Clustering

is a technique in machine learning and data mining that involves grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups (clusters). Clustering is a form of unsupervised learning, which means that it is used to find patterns or relationships in a dataset without the use of labeled data. Clustering algorithms can be used for a variety of applications such as market segmentation, document grouping, image segmentation, and anomaly detection. Some common examples of clustering algorithms include k-means, hierarchical clustering, and density-based clustering.