View Issue Details

IDProjectCategoryView StatusLast Update
0001718CaseTalk ModelerGeneration (SQL, XML, etc)public2023-02-10 11:33
ReporterBCP Software Assigned ToBCP Software  
PrioritynormalSeverityfeatureReproducibilityN/A
Status assignedResolutionopen 
Product Version9.0 
Target VersionFuture 
Summary0001718: Support Natural Language Processing (NLP)
DescriptionMany AI driven NLP libraries exist online currently. They either contain language and entity models, or require training and tagging models as input.
Generating training and tagging models from CaseTalk would be possible. Using these models and libraries to support modelers is more intriguing. Since CaseTalk requires a very strict structure, statistics can only be helpful in marking certain words, and finding them in existing FOM Models, to try and suggest potential entities.
TagsNo tags attached.
CaseTalk Editionunknown

Relationships

related to 0003845 resolvedBCP Software Generate SpaCy training data 

Activities

BCP Software

BCP Software

2021-04-22 09:52

administrator   ~0001968

Libraries to investigate:

* https://spacy.io/usage/spacy-101
* https://textblob.readthedocs.io/en/dev/
* https://stanfordnlp.github.io/CoreNLP/
* https://www.nltk.org/

Once a primary model is trained, adding a FOM output, and parsing back unstructured text for finding relations, could be a helpful addition to the Modeling Efforts.
BCP Software

BCP Software

2023-02-10 11:25

administrator   ~0004779


# pip install -U spacy
# python -m spacy download en_core_web_sm
import spacy

# Load English tokenizer, tagger, parser and NER
nlp = spacy.load("en_core_web_sm")

# Process whole documents
text = ("Apprenticeship Assigned."
"Apprenticeship S101 is assigned to Peter Johnson."
"Apprenticeship S209 is assigned to Mary Blight."
"Apprenticeship Category."
"Apprenticeship S101 falls in the ICT."
"Apprenticeship City."
"Apprenticeship S101 takes place in New York."
"Apprenticeship Description."
"Apprenticeship S101 concerns the development of a magazine infomration system."
"Apprenticeship Duration."
"Apprenticeship S101 will take 4 months."
)
doc = nlp(text)

# Analyze syntax
print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])
print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])

# Find named entities, phrases and concepts
for entity in doc.ents:
    print(entity.text, entity.label_)


Will result in:


Noun phrases: ['Apprenticeship', 'Apprenticeship S101', 'Peter Johnson', 'Apprenticeship S209', 'Mary Blight', 'Apprenticeship Category', 'Apprenticeship S101', 'the ICT.Apprenticeship City', 'Apprenticeship S101', 'place', 'New York', 'Apprenticeship Description', 'Apprenticeship S101', 'the development', 'a magazine infomration system', 'Apprenticeship Duration', 'Apprenticeship S101', '4 months']
Verbs: ['assign', 'assign', 'assign', 'fall', 'take', 'concern', 'take']
Peter Johnson PERSON
Mary Blight PERSON
New York GPE
4 months DATE

Issue History

Date Modified Username Field Change
2018-05-13 11:30 BCP Software New Issue
2018-05-13 11:30 BCP Software Status new => assigned
2018-05-13 11:30 BCP Software Assigned To => BCP Software
2021-04-22 09:52 BCP Software Note Added: 0001968
2023-02-10 11:25 BCP Software Note Added: 0004779
2023-02-10 11:33 BCP Software Relationship added related to 0003845