Stergios Chatzikyriakidis
Professor of Computational Linguistics, University of Crete|
Associate Researcher, Centre for Linguistic Theory and Studies in Probability (CLASP)
  • Home
  • Research outputs
  • Software, Apps and Datasets
  • Contact

Software Apps and Datasets

Applications & Platforms

  • 1. Muves
    An AI research assistant platform for discovery, analysis, and organization of scientific papers. Features include Project Workspace, automated literature reviews, interactive chat, Paper Analysis Agent, and Topic Summary Agent.
    Link to Platform
  • 2. MEDEA-NEUMOUSA
    AI platform for Classical Philology with 5 main functions: (1) Necromancer (translation for 18 dead languages), (2) Knowledge Graph Extraction, (3) Zeugma (neuro-symbolic reasoning with Prolog), (4) Emotion Knowledge Graph, (5) Semantic Analysis.
    Link to Platform | GitHub
  • 3. Simasia-Studio (TextCraft)
    AI text editor and translator performing grammatical/stylistic analysis and automatic correction with domain-specific RAG translation.
    Link to Platform | GitHub
  • 4. RAG-to-Coq Pipeline (Ragged Events)
    AI platform for Historical Event Extraction with 10 extraction modes (Zero-Shot to Few-Shot+RAG+KG) and translation to Coq for formal verification.
    GitHub
  • 5. NATS (Natural Language Analysis & Text Suite)
    Comprehensive application for computational analysis of literary texts featuring Enhanced Document Embeddings, NER for 19 entity types, and Network Analysis.
    GitHub
  • 6. Linguistic Distance Calculator
    Platform for measuring distances between languages (ancestor-descendant) using lexical, phonological, syntactic, and morphological metrics.
    Link to Platform | GitHub
  • 7. Greek Curriculum Ontology Extractor
    AI platform for Automatic Extraction and Analysis of Ontologies from Greek Curricula using LLMs and RAG, with inconsistency detection.
    GitHub
  • 8. Syntax-Expert
    AI system for analyzing syntactic phenomena based on 4 theoretical models (Minimalism, HPSG, LFG, Dynamic Syntax) with comparative and hybrid analysis.
    GitHub
  • 9. Phylogenetic Linguistic Distance System
    System measuring language distance via 7 dimensions (Lexical, Phonological, Syntactic, Morphological, Typological, Cognate, URIEL) for 11 historical language pairs.
    Link to Platform | GitHub
  • 10. DI_detector
    Greek dialect identification system using classical machine learning (Naïve Bayes, SVMs) supporting Cypriot, Pontic, Northern, and Cretan dialects.
    Link to Tool | GitHub

Datasets

  • 11. Greek Dialects Dataset (GRDD & GRDD+)
    Corpus of Greek dialectal varieties. GRDD (~4M words): Cypriot, Pontic, Cretan, Northern. GRDD+ (~7M words): Adds Corsican, Griko, Maniot, Heptanesian, Tsakonian, Katharevousa.
    GitHub
  • 12. Greek Rhyme Dataset
    A dataset for Greek rhyme analysis.
    GitHub
  • 13. Dialogue NLI Dataset (DNLI)
    First dataset for Natural Language Inference in natural dialogue settings, featuring disfluencies (hesitations, false starts) and dialogue-specific phenomena.
    GitHub
  • 14. OYXOY Test Suite
    Modern Greek NLI benchmark with 4 tasks: Multi-label fine-grained NLI, Word Sense Disambiguation, and Metaphor Detection. Features 1,763 NLI pairs and 6,896 word senses.
    GitHub
  • 15. SuperOYXOY
    Extension of OYXOY including Paraphrase Detection, Augmented NLI, and Bias Recognition.
  • 16. Fine-Grained Entailment Resources
    Three datasets: (1) Extended Greek FraCaS (774 examples), (2) SuperGLUE/RTE with Missing Hypotheses completed, (3) De-dropped Greek XNLI (restored pronouns).
    GitHub
  • 17. Precise Entailment RTE 2.0
    RTE dataset annotated for precise entailment, identifying missing logical premises in natural text.
    Paper/Link
  • 18. Shami Corpus
    First large-scale Levantine Arabic corpus (~110k sentences) covering Jordanian, Syrian, Palestinian, and Lebanese varieties.
    GitHub
  • 19. ATSAD (Arabic Tweets Sentiment Analysis Dataset)
    Dataset of 36k Arabic tweets with sentiment annotations, emojis, and distant supervision.
    GitHub
  • 20. Shami-Senti
    First sentiment analysis dataset for Levantine Arabic with ternary classification (~2.5k examples).
    GitHub
  • 21. Interwar poetry and prose dataset
    Corpus of Modern Greek interwar poetry used for RAG generation.
    Link to Dataset

Code & Libraries

  • 22. Coq for Natural Language Semantics / FraCoq
    Open source code for formal verification of semantic models using Coq and the "Formal Semantics in Modern Type Theories" monograph.
    GitHub (Book Code) | GitHub (FraCoq)
  • 23. Compositional Bayesian Semantics
    Haskell implementations for modeling Bayesian techniques in natural language semantics.
    GitHub | GitHub (BBCLM2019)
  • 24. Anvec: Detecting Metaphoricity
    Python code and online demo for predicting metaphoric usage in Adjective-Noun pairs using Deep Learning.
    GitHub
Proudly powered by Weebly