Software Apps and Datasets
Applications & Platforms
-
1. Muves
An AI research assistant platform for discovery, analysis, and organization of scientific papers. Features include Project Workspace, automated literature reviews, interactive chat, Paper Analysis Agent, and Topic Summary Agent.
Link to Platform -
2. MEDEA-NEUMOUSA
AI platform for Classical Philology with 5 main functions: (1) Necromancer (translation for 18 dead languages), (2) Knowledge Graph Extraction, (3) Zeugma (neuro-symbolic reasoning with Prolog), (4) Emotion Knowledge Graph, (5) Semantic Analysis.
Link to Platform | GitHub -
3. Simasia-Studio (TextCraft)
AI text editor and translator performing grammatical/stylistic analysis and automatic correction with domain-specific RAG translation.
Link to Platform | GitHub -
4. RAG-to-Coq Pipeline (Ragged Events)
AI platform for Historical Event Extraction with 10 extraction modes (Zero-Shot to Few-Shot+RAG+KG) and translation to Coq for formal verification.
GitHub -
5. NATS (Natural Language Analysis & Text Suite)
Comprehensive application for computational analysis of literary texts featuring Enhanced Document Embeddings, NER for 19 entity types, and Network Analysis.
GitHub -
6. Linguistic Distance Calculator
Platform for measuring distances between languages (ancestor-descendant) using lexical, phonological, syntactic, and morphological metrics.
Link to Platform | GitHub -
7. Greek Curriculum Ontology Extractor
AI platform for Automatic Extraction and Analysis of Ontologies from Greek Curricula using LLMs and RAG, with inconsistency detection.
GitHub -
8. Syntax-Expert
AI system for analyzing syntactic phenomena based on 4 theoretical models (Minimalism, HPSG, LFG, Dynamic Syntax) with comparative and hybrid analysis.
GitHub -
9. Phylogenetic Linguistic Distance System
System measuring language distance via 7 dimensions (Lexical, Phonological, Syntactic, Morphological, Typological, Cognate, URIEL) for 11 historical language pairs.
Link to Platform | GitHub -
10. DI_detector
Greek dialect identification system using classical machine learning (Naïve Bayes, SVMs) supporting Cypriot, Pontic, Northern, and Cretan dialects.
Link to Tool | GitHub
Datasets
-
11. Greek Dialects Dataset (GRDD & GRDD+)
Corpus of Greek dialectal varieties. GRDD (~4M words): Cypriot, Pontic, Cretan, Northern. GRDD+ (~7M words): Adds Corsican, Griko, Maniot, Heptanesian, Tsakonian, Katharevousa.
GitHub -
12. Greek Rhyme Dataset
A dataset for Greek rhyme analysis.
GitHub -
13. Dialogue NLI Dataset (DNLI)
First dataset for Natural Language Inference in natural dialogue settings, featuring disfluencies (hesitations, false starts) and dialogue-specific phenomena.
GitHub -
14. OYXOY Test Suite
Modern Greek NLI benchmark with 4 tasks: Multi-label fine-grained NLI, Word Sense Disambiguation, and Metaphor Detection. Features 1,763 NLI pairs and 6,896 word senses.
GitHub -
15. SuperOYXOY
Extension of OYXOY including Paraphrase Detection, Augmented NLI, and Bias Recognition. -
16. Fine-Grained Entailment Resources
Three datasets: (1) Extended Greek FraCaS (774 examples), (2) SuperGLUE/RTE with Missing Hypotheses completed, (3) De-dropped Greek XNLI (restored pronouns).
GitHub -
17. Precise Entailment RTE 2.0
RTE dataset annotated for precise entailment, identifying missing logical premises in natural text.
Paper/Link -
18. Shami Corpus
First large-scale Levantine Arabic corpus (~110k sentences) covering Jordanian, Syrian, Palestinian, and Lebanese varieties.
GitHub -
19. ATSAD (Arabic Tweets Sentiment Analysis Dataset)
Dataset of 36k Arabic tweets with sentiment annotations, emojis, and distant supervision.
GitHub -
20. Shami-Senti
First sentiment analysis dataset for Levantine Arabic with ternary classification (~2.5k examples).
GitHub -
21. Interwar poetry and prose dataset
Corpus of Modern Greek interwar poetry used for RAG generation.
Link to Dataset
Code & Libraries
-
22. Coq for Natural Language Semantics / FraCoq
Open source code for formal verification of semantic models using Coq and the "Formal Semantics in Modern Type Theories" monograph.
GitHub (Book Code) | GitHub (FraCoq) -
23. Compositional Bayesian Semantics
Haskell implementations for modeling Bayesian techniques in natural language semantics.
GitHub | GitHub (BBCLM2019) -
24. Anvec: Detecting Metaphoricity
Python code and online demo for predicting metaphoric usage in Adjective-Noun pairs using Deep Learning.
GitHub