CNLearn Schemas - Creating Some Pydantic Models and Testing Them
You won’t skip class today..That’s exactly what we’ll discuss today. This is a (somewhat) rewrite of the og post here, but many things have changed since. For one, I am using Pydantic for my vocabulary structures. Why? Well it provides validation and it also integrates nicely with the Web CNLearn version. They have nice JSON exporting and a few other things. I want the back code to be as decoupled as possible from the GUI. That way, if I decide to switch from Kivy at some point in the future (I have no intention but who knows what I will want to learn next), hopefully it’s easier…hopefully.
(yes I know there’s been some controversy and discussion regarding pydantic’s type annotations and upcoming Python 3.10 version with PEP 563, I won’t get into politics here. I’m sure everyone will be able to work it out. One day I’d like to contribute to that as well: the development I mean, not the arguing and insulting)
Vocabulary Structures
Ok so what kind of structures will I have? I will have a Common structure which inherits from BaseClass and ABC. A Pydantic model is simply one that inherits from BaseModel. That then provides validation for the fields that we have as well as many useful methods. If you’re curious about what inheriting from BaseModel actually does, have a look at the source code. It does a lot of work when creating a model and when setting attributes. We could have written our own similar version but no need. So we are inheriting from BaseModel. Why are we also inheriting from ABC? And what is an ABC? You don’t know your alphabet? It goes like this: A for Abstract, B for Base, C for Class, D for duck-typing, etc. So what are abstract base classes? There’s some information here and here. Essentially they will contain abstract methods that need to be implemented by classes inheriting from it. If you’re familiar with other OOP languages, like Java, it’s somewhat like an interface. At the beginning I won’t make heavy use of them but at some point I will.
The Common class inherits from BaseModel and ABC: BaseModel for the validation of attributes, ABC for the abstract methods that child classes will have to implement. The Character and Word classes will then inherit from Common. The Radical class, however, will not. Let’s have a look at the Common class:
class Common(BaseModel, ABC):
"""
Common class.
The Character and Word classes will derive form it.
Its methods are implemented as ABC methods that its children will have
to define.
"""
id: Optional[int]
definitions: str
stroke_diagram: Optional[str] # not yet implemented, will likely be a reference
# to a SVG file (e.g. 37683.svg)
simplified: str
traditional: str
pinyin_num: str
pinyin_accent: str
pinyin_clean: str
also_pronounced: Optional[str]
also_written: Optional[str]
classifiers: Optional[str]
frequency: int
class Config:
orm_mode = True
@abstractmethod
def list_components(self):
pass
@abstractmethod
def list_words(self):
pass
@abstractmethod
def list_sentences(self):
pass
@abstractmethod
def get_traditional(self):
pass
@abstractmethod
def get_simplified(self):
pass
@abstractmethod
def get_pinyin(self, pinyin_type):
pass
So what are the required fields? ID (will be an integer from the database), definitions (string), simplified, traditional, pinyin_num, pinyin_accent, pinyin_clean and frequency. Now you might be thinking: the SQLAlchemy model we wrote for the Characters table didn’t have all of those. What are you doing??? Well, when creating the Character class it will extract stuff from the Words table as well. That’s why we’ve been implementing some of those CRUD methods.
Let’s also look at the Radical and Word structures. In the Radical class we have:
class Character(Common):
character_type: Optional[CharacterType] # optional for now
radical: Optional[str]
decomposition: Optional[str]
etymology: Optional[Dict]
It also implements all the methods required by our ABC but they all currently return None. What about the Word structure?
class Word(Common):
pinyin_no_spaces: str
components: Optional[List[Character]]
radical: Optional[Radical] # if it's one character word will have
hsk: Optional[HSKLevel] # some words won't have this
So what is the flow of the programme? A Chinese string is entered -> it gets segmented into Words. Some of them are multiple-character and some are one-character words. Let’s think of the 1 character words first. They will have a Character structure but with information also taken from the Words table. For a word with multiple characters, it will be a Word structure. It will, however, contain one-character word components which are defined as previously mentioned.
OK so we have these structures now. Are you saying you want to test them? I agree! Let’s create a test_schemas.py file in the tests directory.
import pytest
from sqlalchemy.orm.session import Session
from src.schemas.structures import Character, Word
from src.db.models import Word as Word_model, Character as Character_model
from src.db.crud import (
get_simplified_word,
get_word_and_character,
)
from src.db.settings import SessionLocal
@pytest.fixture
def db() -> Session:
"""
Returns a reusable database session.
"""
session: Session = SessionLocal()
return session
@pytest.fixture
def my_character_1() -> Character:
return Character(
simplified="不",
traditional="不",
pinyin_num="bu4",
pinyin_accent="bù",
pinyin_clean="bu",
definitions="(negative prefix); not; no",
decomposition="⿱一?",
etymology={
"type": "ideographic",
"hint": "A bird flying toward the sky\u00a0\u4e00",
},
radical="一",
frequency=459467,
)
@pytest.fixture
def my_character_2() -> Character:
return Character(
simplified="满",
traditional="滿",
definitions="to fill; full; filled; packed; fully; completely; quite; to reach the limit; to satisfy; satisfied; contented",
pinyin_num="man3",
pinyin_accent="mǎn",
pinyin_clean="man",
decomposition="⿰氵⿱艹两",
etymology={
"type": "pictophonetic",
"phonetic": "\u34bc",
"semantic": "\u6c35",
"hint": "water",
},
radical="氵",
frequency=10702,
)
@pytest.fixture
def my_word_1(my_character_1, my_character_2) -> Word:
return Word(
simplified="不满",
traditional="不滿",
definitions="resentful; discontented; dissatisfied",
pinyin_num="bu4 man3",
pinyin_accent="bù mǎn",
pinyin_clean="bu man",
pinyin_no_spaces="buman",
# for now I am manually specifying what the components are
# later they will be created automatically
components=[my_character_1, my_character_2],
frequency="3157",
)
# let's test some of the dictionaries created
def test_character1_dictionary(my_character_1: Character):
"""
Tests the fields from Character 1.
"""
character: Character = my_character_1
assert character.definitions == "(negative prefix); not; no"
assert character.simplified == character.traditional == "不"
assert character.pinyin_accent == "bù"
assert character.pinyin_num == "bu4"
assert character.pinyin_clean == "bu"
def test_word_1_components(
my_word_1: Word, my_character_1: Character, my_character_2: Character
):
"""
Tests the component characters of a Word schema.
"""
word: Word = my_word_1
assert my_character_1 in word.components and my_character_2 in word.components
def test_bu_character_database(db, my_character_1):
"""
Tests the results for the 不 character from the database through the Character schema
"""
bu_word, bu_character = get_word_and_character(db, simplified="不")
bu_character_schema: Character = Character.from_orm(bu_word)
bu_character_schema.decomposition = bu_character.decomposition
bu_character_schema.etymology = bu_character.etymology
bu_character_schema.radical = bu_character.radical
assert bu_character_schema.traditional == my_character_1.traditional
assert bu_character_schema.simplified == my_character_1.simplified
assert bu_character_schema.pinyin_num == my_character_1.pinyin_num
assert bu_character_schema.pinyin_accent == my_character_1.pinyin_accent
assert bu_character_schema.pinyin_clean == my_character_1.pinyin_clean
assert bu_character_schema.definitions == my_character_1.definitions
assert bu_character_schema.decomposition == my_character_1.decomposition
assert bu_character_schema.etymology == my_character_1.etymology
assert bu_character_schema.radical == my_character_1.radical
def test_man_character_database(db, my_character_2):
"""
Tests the results for the 满 character from the database through the Character schema
"""
man_word, man_character = get_word_and_character(
db, simplified="满", pinyin_clean="man"
)
man_character_schema: Character = Character.from_orm(man_word)
man_character_schema.decomposition = man_character.decomposition
man_character_schema.etymology = man_character.etymology
man_character_schema.radical = man_character.radical
assert man_character_schema.traditional == my_character_2.traditional
assert man_character_schema.simplified == my_character_2.simplified
assert man_character_schema.pinyin_num == my_character_2.pinyin_num
assert man_character_schema.pinyin_accent == my_character_2.pinyin_accent
assert man_character_schema.pinyin_clean == my_character_2.pinyin_clean
assert man_character_schema.definitions == my_character_2.definitions
assert man_character_schema.decomposition == my_character_2.decomposition
assert man_character_schema.etymology == my_character_2.etymology
assert man_character_schema.radical == my_character_2.radical
def test_buman_character_database(db, my_word_1):
"""
Tests the results for the 不满 word from the database through the Word schema
"""
bu_man_word_list: Word_model = get_simplified_word(db, simplified="不满")
bu_man_word_schema = Word.from_orm(bu_man_word_list[0].Word)
assert bu_man_word_schema.traditional == my_word_1.traditional
assert bu_man_word_schema.simplified == my_word_1.simplified
assert bu_man_word_schema.pinyin_num == my_word_1.pinyin_num
assert bu_man_word_schema.pinyin_accent == my_word_1.pinyin_accent
assert bu_man_word_schema.pinyin_clean == my_word_1.pinyin_clean
assert bu_man_word_schema.definitions == my_word_1.definitions
I won’t go through the details of the tests, they are similar to ones from previous posts using pytest. I will, however, parametrise them at some point :)
Finally, the commit for this post is here.