Very much looking forward to spending some time implementing this alongside the article. I really enjoyed your posts about making a Teeny Tiny compiler a while back too!
chubot 1 days ago [-]
It's very nice to see a small type checker in Python, for Python! This became much easier in the last 10 years, since the MyPy team basically "upstreamed" the typed_ast library they were using into the stdlib.
I found that there are not enough good teaching materials on type checkers -- e.g. the second edition of the Dragon Book lacks a type checker, which is a glaring hole IMO - https://news.ycombinator.com/item?id=38270753
Also, teaching material tends to have a bias toward type inference and the Hindley-Milner algorithm, which are NOT used by the most commonly used languages
So I appreciate this, but one thing in this code that I find (arguably) confusing is the use of visitors. e.g. for this part, I had to go look up what this method does in Python:
# Default so every expr returns a Type.
def generic_visit(self, node):
super().generic_visit(node)
if isinstance(node, ast.expr):
return ANY
Also, the main() calls visit(), but the visitor methods ALSO call visit(), which obscures the control flow IMO. Personally, if I need to use a visitor, I like there to just be a single pass
---
In contrast, Essentials of Compilation was released 1 or 2 years ago, in Racket and in Python. And the Python version uses the same typed AST module.
> I found that there are not enough good teaching materials on type checkers -- e.g. the second edition of the Dragon Book lacks a type checker, which is a glaring hole IMO
Pierce’s Types and Programming Languages[1] is excellent. It starts with very little (if you understand basic set-theory notation, you’re probably OK), gets you to a pretty reasonable point, and just generally makes for very pleasant reading. You should probably pick something else if you want a hands-on introduction with an immediate payoff, but then you probably wouldn’t pick the Dragon Book, either.
TaPL really falls down when trying to bootstrap your way to understanding the notation. A lot of the notation and theory revolves around, essentially, implementing a concurrent virtual machine. I like the original algorithm W paper because it doesn't gloss this conceptual step: it is very much a virtual machine & you can see the authors handling the edge cases. The operational semantics in TaPL are (frankly) obtuse. Also, TaPL makes it seem like new features can be desugared to old features — and they can — but a little more prose explaining the feature's behavior directly without just tossing you into the semantic deep end would've made a much nicer text.
chubot 1 days ago [-]
Everyone always says that, but I don't think it's a good intro :-) (e.g. I think the top comment in the lobste.rs thread is suffering from the type inference / functional bias, which is not necessarily due to TAPL, but it's a common thing I've noticed)
Right now I think Siek's book is better for what I want to do, though admittedly I didn't get that far into it, because my type checking project is way on the back burner
I would like to see any type checkers that people wrote after reading TAPL!
asplake 1 days ago [-]
I’m writing one now as part of a hobby language project, about which I’ll do a Show HN once I have enough to share. I enjoyed Pierce but to your point I am going mostly down the functional route. Programming it in Python, with the book closed but after two readings I have what I need in my head (it clicked much better second time through).
Edit: This project (best fun I’ve had programming in a long while) is what got me sharing Eli Bendersky’s Unification post a couple of weeks back https://news.ycombinator.com/item?id=44938156
1 days ago [-]
zem 1 days ago [-]
once you get used to it, visitors are a very pleasant way to write ast walking code in python. they are essentially generating your case statement for you, so instead of `case ast.Expr: handle_expr(node)` you just write a `self.visit_expr` method and have the visitor match the node type to the method name and call it.
mrkeen 9 hours ago [-]
Doing it this way maxes coupling and minimises cohesion.
Your language will have a number of phases/passes to carry out. Let's say LambdaLifting, TypeChecking and Inlining.
All the code for lambda lifting belongs in one module, all the code for type-checking in another module, etc.
If you instead use visitor pattern, you will be looking at all the code related to Variable, Function, Literal in those files respectively.
So when you're working on Function.typecheck(), it will sit in source code just under Function.lambdalift() and just above Function.inline() - things which you don't want to consider together. Meanwhile, you'll need to switch between source files to work on Variable.typecheck() and Literal.typecheck().
Jtsummers 7 hours ago [-]
> If you instead use visitor pattern, you will be looking at all the code related to Variable, Function, Literal in those files respectively.
I've never organized visitor pattern code that way. Usually it's something like:
So related functions (across the types you're visiting) are kept together, you're not revisiting the Function module to add a new visitor there. That would almost defeat the purpose of the pattern.
No it's not pleasant at all. It's boilerplate heavy, non-local and indirect. It's presumably a large part of why pattern matching is arriving in Python.
flare_blitz 24 hours ago [-]
That's a lot of buzzwords to say that you enjoy shoving everything in one function. :)
grumpyprole 15 hours ago [-]
In hindsight, I think your description is indeed better!
zem 24 hours ago [-]
I guess that's subjective - I'm as big a fan of pattern matching as anyone, but when I was writing a type checker in python we made heavy use of visitors and it made the code pleasant to maintain.
chubot 4 hours ago [-]
That's not the only difference - the other issue is that you lose the stack, and must rely on member variables instead.
self.push(narrows_true); [self.visit(s) for s in n.body]; self.pop()
self.push(narrows_false); [self.visit(s) for s in n.orelse]; self.pop()
In the functional style, you just pass a param using the stack, rather than using an explicit stack.
It's not so bad here, but with a big enough language, and more complicated algorithms, the mutable member variables basically become "mutable globals".
And if you re-call visit() at arbitrary depths, IMO the algorithm gets obscured.
I'd be interested in analysis of why that is, but I suspect it's mainly style
zem 4 hours ago [-]
yeah, pytype used a mix of visitors and if statements (we were trying to retain 3.8 compatibility for a while so we didn't switch to `match`), depending on what fit various parts of the code best. it wasn't a particularly dogmatic "we will use visitors because that's the one true design pattern" thing, just that some problems fit the pattern neatly.
onestay42 1 days ago [-]
It's amazing to me that a python program can be written to make sure another python program is pythoning properly.
goku12 1 days ago [-]
Just curious. Isn't that how development tools generally work? Would you be surprised if it was in and for a compiled language? (This isn't a dismissal. I'm curious about the aspect of this specific case that amuses you.)
mhh__ 23 hours ago [-]
Foundational tooling not being written in a compiled language (fast is good, it could be jitted, but ideally it's a single binary) is actually a huge tax that I'm quite glad we're getting over as an industry.
Python is probably the apex of the "slow + doesn't work without a magic environment" problem
onestay42 23 hours ago [-]
I suppose it is how this kind of tool generally works. I think it's just some subset of the feeling I get when someone writes(implements?) $LANGUAGE in $LANGUAGE(e.g. brainf*ck in brainf*ck)
EDIT: escaped censorship
mhh__ 23 hours ago [-]
You can see from how quickly the code becomes extremely busy and annoying to read that python being flexible is a blessing and a curse. Maybe curse is the wrong word, but none of this was really designed cohesively so it's usually very janky and a bit slow.
I found that there are not enough good teaching materials on type checkers -- e.g. the second edition of the Dragon Book lacks a type checker, which is a glaring hole IMO - https://news.ycombinator.com/item?id=38270753
Also, teaching material tends to have a bias toward type inference and the Hindley-Milner algorithm, which are NOT used by the most commonly used languages
So I appreciate this, but one thing in this code that I find (arguably) confusing is the use of visitors. e.g. for this part, I had to go look up what this method does in Python:
Also, the main() calls visit(), but the visitor methods ALSO call visit(), which obscures the control flow IMO. Personally, if I need to use a visitor, I like there to just be a single pass---
In contrast, Essentials of Compilation was released 1 or 2 years ago, in Racket and in Python. And the Python version uses the same typed AST module.
https://www.amazon.com/Essentials-Compilation-Incremental-Ap...
But it uses a more traditional functional style, rather than the OO visitor style:
https://github.com/IUCompilerCourse/python-student-support-c...
So one thing I did was to ask an LLM to translate this code from OO to functional style :-) But I didn't get around to testing it
(I looked at this code a week ago when it appeared on lobste.rs [1], and sent a trivial PR [2])
[1] https://lobste.rs/s/opwycf/baby_s_first_type_checker
[2] https://github.com/AZHenley/babytypechecker/pull/1
Pierce’s Types and Programming Languages[1] is excellent. It starts with very little (if you understand basic set-theory notation, you’re probably OK), gets you to a pretty reasonable point, and just generally makes for very pleasant reading. You should probably pick something else if you want a hands-on introduction with an immediate payoff, but then you probably wouldn’t pick the Dragon Book, either.
[1] https://www.cis.upenn.edu/~bcpierce/tapl/
Right now I think Siek's book is better for what I want to do, though admittedly I didn't get that far into it, because my type checking project is way on the back burner
I would like to see any type checkers that people wrote after reading TAPL!
Edit: This project (best fun I’ve had programming in a long while) is what got me sharing Eli Bendersky’s Unification post a couple of weeks back https://news.ycombinator.com/item?id=44938156
Your language will have a number of phases/passes to carry out. Let's say LambdaLifting, TypeChecking and Inlining.
All the code for lambda lifting belongs in one module, all the code for type-checking in another module, etc.
If you instead use visitor pattern, you will be looking at all the code related to Variable, Function, Literal in those files respectively.
So when you're working on Function.typecheck(), it will sit in source code just under Function.lambdalift() and just above Function.inline() - things which you don't want to consider together. Meanwhile, you'll need to switch between source files to work on Variable.typecheck() and Literal.typecheck().
I've never organized visitor pattern code that way. Usually it's something like:
So related functions (across the types you're visiting) are kept together, you're not revisiting the Function module to add a new visitor there. That would almost defeat the purpose of the pattern.https://en.wikipedia.org/wiki/Visitor_pattern - See the UML diagram here.
If you search for pop(), you can see that
and In the functional style, you just pass a param using the stack, rather than using an explicit stack.It's not so bad here, but with a big enough language, and more complicated algorithms, the mutable member variables basically become "mutable globals".
And if you re-call visit() at arbitrary depths, IMO the algorithm gets obscured.
---
That said, I agreed here that visitors are useful when you need to say traverse all string literals in an AST, at arbitrary depths: https://lobste.rs/s/jdgjjt/visitor_pattern_considered_pointl...
---
A sign that this issue isn't settled is that two of the more complex type checkers make opposite decisions
- MyPy uses visitors extensively - https://github.com/python/mypy/tree/master/mypy
- TypeScript mostly uses switch/case functions - https://github.com/microsoft/TypeScript/blob/main/src/compil...
I'd be interested in analysis of why that is, but I suspect it's mainly style
Python is probably the apex of the "slow + doesn't work without a magic environment" problem
EDIT: escaped censorship