< November 2021 >
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29300102030405

Tuesday, 23 November 2021

05:00 PM

Start Planning Your Escape with Nat Geo’s 25 Amazing Journeys for 2022 [The Adventure Blog] 05:00 PM, Tuesday, 23 November 2021 04:00 AM, Thursday, 06 January 2022

These 25 extraordinary locations are sure to be on more than a few bucket lists and some are in the United States.SUBSCRIBE to GMA's YouTube page: https://b...

04:57 PM

Tree Sitter and the Complications of Parsing Languages [Mastering Emacs] 04:57 PM, Tuesday, 23 November 2021 05:00 PM, Tuesday, 23 November 2021

You might be surprised to hear when you visit a file in Emacs that the syntax highlighting you are shown on your screen is – most likely – a potpourri of regular expressions with a dash of functions and syntax table definitions. As it turns out, this approach is just about good enough right up until the point where it isn’t.

So there’s a whole host of features in Emacs that tries to work around the inevitable performance or parsing gaps, like giving up if the search space is too large; only partially scanning the buffer; and so on. So when the font locking turns to treacle, and if you’re trained in the eldritch arts, you might have strong opinions on arcana like jit-lock-stealth-time and jit-lock-antiblink-grace.

So why keep doing it that way, then? Well it’s more than good enough. I can think of very few examples where it wasn’t for me; but that’s not to say it’s the platonic ideal of what syntax highlighting should or ought to be, though.

But what’s more surprising is that’s how most IDEs and text editors work. Why?

Well, because it’s gosh-darn hard to do it the right way. The proper way is to start with a grammar of the language, usually Extended Backus-Naur Format, and work your way through its terse definitions of the language until you have a reasonable grasp of what you need to do and, ah — yes. Now you have to write the parser. And it mustn’t be slow, either; oh, and you have to make it work with broken code, too. Because that’s the resting state of all code that you are editing: as you type the syntax highlighter beavers away in the background to give you some semblance of what reality would look like, if only you’d hurry up and make it syntactically correct, thank-you-very-much.

If a regular expression is the answer that yields two problems (as the old joke goes), then this is surely the one that yields three or four.

Even if you did have the grammar and an able parser, the grammar might be wrong or it might lack sufficient context to parse it with that alone. For Python it’s good enough; for C or C++ then I wish you good luck. And for Perl (or whatever it’s called these days) only Larry Wall himself can save you.

It’s a hard problem, and many have had a bite of the cherry over the years with mixed results. Building a parser that can handle the unsteady state your ever-changing source code finds itself in is very, very difficult. You also need to generate incremental changes to the tree that your parser yields so it doesn’t have to redo the whole thing on every keypress. It’s a really hard problem, but the rewards are so worth it though:

  1. Perfect syntax highlighting.
  2. Semantic clues, like: variables and function arguments are correctly highlighted in the scope they are relevant in; perfect navigational aids for function and class names; easy refactoring and so much more.
  3. Inspectable tree that you can use to build out additional tooling relevant to a language.
  4. Proper multi-language support like Javascript + React-style JSX in the same page. Or PHP + HTML. Or Yaml + Jinja, etc. etc.

And the list goes on.

But what about CEDET?

A long time ago a very smart guy named Eric Ludlam created CEDET, the Collection of Emacs Development Environment Tools, a large collection of development tools that aimed to give Emacs a complete IDE-like experience. Eric clearly worked a lot with C++ so that’s what it supported best most of all, but it supports many other common languages like Python, Java and Javascript.

But CEDET opted for something much cleverer than just a package for the C++ universe: he wrote the Semantic Bovinator, a parser designed to solve the four points I mentioned above. Unfortunately it never really caught on, even though an effort was made about 10 years ago to hulk smash parts of the code into Emacs proper, where big parts of it lives today. Some features like EDE (project management suite),Speedbar (a navbar), EIEIO (Common Lisp-style Classes) and Semantic Mode (the main draw of CEDET) made it into Emacs core.

(And yes, Eric clearly loved farm-themed naming schemes. Like the old nursery rhyme Old Macdonald had a farm… EIEIO)

You can try it right now in your Emacs: open up a Javascript, C/C++, Java, or Python file and type M-x semantic-mode. Now navigate with M-x senator-xxxxx or check out the Semantic keymap with: C-c , C-h. The grammar files haven’t been updated in a long while so it’s possible your code’s ahead of the grammar and it may fail; but still, a herculean effort, and very impressive. And I’ll bet you didn’t know Emacs had that for the better part of a decade.

I used CEDET for a while back in the day when it was still actively maintained, and in a parallel universe it might’ve been what we’d all be using today. It worked just shy of well enough for Python that I could not switch to it. It’s a shame it was dropped on the floor as it had everything: EDE the project management suite; semantic code search and completion in Semantic; Speedbar (M-x speedbar); SRecode, a templatized code generator, and so much more.

Which then brings me to the crux of the article: Tree Sitter.

Tree Sitter

Enter tree sitter. I believe it started its life as the semantic search feature on Github itself (hover your cursor over a function call and it’ll take you to its definition) where it’s in use to this day.

It’s quick, and it solves most of the problems I talked about earlier. It also has an impressive list of languages it supports and a very large community backing which is important. It’s also available in Emacs for you to use right now: Emacs Tree Sitter and it’s on MELPA. Download, install, and type M-x tree-sitter-hl-mode in a buffer to try it out. It requires module support in your Emacs, though, but that’s usually not a problem with newer Emacsen.

So this is the future of incremental language parsing. And it’ll be the future, too, in Emacs, as there are considerations under way to include the bindings needed to talk to tree sitter directly.

But that’s not all. Tree sitter is easy to use, and it comes with a query language that uses S-expressions — which in my mind is fate alone that it was meant to be.

“But what about LSP?” I can hear some of you say. The reason (most) LSP servers don’t offer syntax highlighting is because of the drag on performance. Every keystroke you type must be sent to the server, processed, a partial tree returned, and your syntax highlighting updated. Repeat that up to 100 words per minute (or whatever your typing speed is) and you’re looking at a lot of cross-chatter that is just better suited for in-process communication. But of course that doesn’t mean it can’t replace the language parsing used for other features in LSP!

So I like to think of tree sitter’s role in Emacs as the spiritual successor to what Eric Ludlam started back in the day. It’s super quick and available with several bindings; it has an S-expression-based query language; and it supports dozens of languages out of the box, with more to come. And the author’s a really friendly guy, too.

Luckily the Emacs developers are debating the role and place it’ll have in Emacs, if any. I would be surprised if it does not find a home in Emacs core. Native bindings in Emacs and a long-term plan to rewrite major modes that stand to benefit from it the most is most likely going to happen in the next year or two.

ParEdit Everywhere: Meet Combobulate

ParEdit, if you don’t know it, is a supercharged minor mode for LISP-likes. It comes with a large array of tools that operate on S-expressions like merging, joining, splitting and navigating. It’s both powerful and intuitive.

I only use maybe 15% of its capabilities but it greatly speeds up the tedium of refactoring elisp. It’s also a bit of a “holy grail” of what people want in other languages.

A decade ago I hacked paredit to kinda-sorta-but-not-really work on Python (yes, seriously) and although some of the features worked it was never really going to happen, but the idea stuck with me. Now that tree sitter (and the excellent Emacs Tree Sitter package) is a “thing” I had another crack at it, but this time written from scratch to better pander to the different types of programming languages.

I call it Combobulate. You can download an alpha version and play around with it if you like. I’ve been dogfooding it for about 10-11 months and I’m finally going to stop kicking the can down the road and get it out there.

I think it works quite well. I’ll write about the many trials and tribulations of writing even just the bits you do see, as there’s a lot to be said for the way Emacs currently does navigation and editing and how to tie that in with the existing tools. I’m a big believer in Emacs’s take on navigation and editing, and I’m convinced there’s a way to merge the two worlds in a way that it doesn’t break localized editing and larger structural editing that combobulate is made for.

I’ll round it out and say that interacting with tree sitter’s concrete syntax tree makes it easy to do cool stuff with a handful of lines, but making something ergonomic and flexible that works across languages is not. That’s what took me the longest. That and making it somewhat performant.

But I’ll talk about that soon enough. But if you’re interested in playing around (and contributing, it’s easy!) then check it out and let me know what you think.

Feeds

FeedRSSLast fetchedNext fetched after
XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Bits of DNA XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
blogs.perl.org XML 12:00 AM, Tuesday, 18 January 2022 12:15 AM, Tuesday, 18 January 2022
Blue Collar Bioinformatics XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Boing Boing XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Epistasis Blog XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Futility Closet XML 12:00 AM, Tuesday, 18 January 2022 12:15 AM, Tuesday, 18 January 2022
gCaptain XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Hackaday XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
In between lines of code XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
InciWeb Incidents for California XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
LeafSpring XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Living in an Ivory Basement XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
LWN.net XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Mastering Emacs XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Planet Debian XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Planet Emacsen XML 12:00 AM, Tuesday, 18 January 2022 12:15 AM, Tuesday, 18 January 2022
RNA-Seq Blog XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
RStudio Blog - Latest Comments XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
RWeekly.org - Blogs to Learn R from the Community XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
The Adventure Blog XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
The Allium XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Variance Explained XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
January 2022
MonTueWedThuFriSatSun
27282930310102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506
December 2021
MonTueWedThuFriSatSun
29300102030405
06070809101112
13141516171819
20212223242526
27282930310102
November 2021
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29300102030405
October 2021
MonTueWedThuFriSatSun
27282930010203
04050607080910
11121314151617
18192021222324
25262728293031
September 2021
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930010203
August 2021
MonTueWedThuFriSatSun
26272829303101
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
July 2021
MonTueWedThuFriSatSun
28293001020304
05060708091011
12131415161718
19202122232425
26272829303101
June 2021
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293001020304
May 2021
MonTueWedThuFriSatSun
26272829300102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506
April 2021
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272829300102
March 2021
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
February 2021
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
November 2020
MonTueWedThuFriSatSun
26272829303101
02030405060708
09101112131415
16171819202122
23242526272829
30010203040506
September 2020
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293001020304
July 2020
MonTueWedThuFriSatSun
29300102030405
06070809101112
13141516171819
20212223242526
27282930310102
June 2020
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29300102030405
May 2020
MonTueWedThuFriSatSun
27282930010203
04050607080910
11121314151617
18192021222324
25262728293031
April 2020
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930010203
February 2020
MonTueWedThuFriSatSun
27282930310102
03040506070809
10111213141516
17181920212223
24252627282901
January 2020
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930310102
December 2019
MonTueWedThuFriSatSun
25262728293001
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
November 2019
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728293001
October 2019
MonTueWedThuFriSatSun
30010203040506
07080910111213
14151617181920
21222324252627
28293031010203
August 2019
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272829303101
July 2019
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
June 2019
MonTueWedThuFriSatSun
27282930310102
03040506070809
10111213141516
17181920212223
24252627282930
May 2019
MonTueWedThuFriSatSun
29300102030405
06070809101112
13141516171819
20212223242526
27282930310102
April 2019
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29300102030405
March 2019
MonTueWedThuFriSatSun
25262728010203
04050607080910
11121314151617
18192021222324
25262728293031
February 2019
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728010203
January 2019
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293031010203
December 2018
MonTueWedThuFriSatSun
26272829300102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506
November 2018
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272829300102
October 2018
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
September 2018
MonTueWedThuFriSatSun
27282930310102
03040506070809
10111213141516
17181920212223
24252627282930
August 2018
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930310102
July 2018
MonTueWedThuFriSatSun
25262728293001
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
June 2018
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728293001
May 2018
MonTueWedThuFriSatSun
30010203040506
07080910111213
14151617181920
21222324252627
28293031010203
April 2018
MonTueWedThuFriSatSun
26272829303101
02030405060708
09101112131415
16171819202122
23242526272829
30010203040506
February 2018
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272801020304
January 2018
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
December 2017
MonTueWedThuFriSatSun
27282930010203
04050607080910
11121314151617
18192021222324
25262728293031
November 2017
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930010203
September 2017
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728293001
August 2017
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293031010203
March 2017
MonTueWedThuFriSatSun
27280102030405
06070809101112
13141516171819
20212223242526
27282930310102
January 2017
MonTueWedThuFriSatSun
26272829303101
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
November 2016
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293001020304
October 2016
MonTueWedThuFriSatSun
26272829300102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506
September 2016
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272829300102
August 2016
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
July 2016
MonTueWedThuFriSatSun
27282930010203
04050607080910
11121314151617
18192021222324
25262728293031
May 2016
MonTueWedThuFriSatSun
25262728293001
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
April 2016
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728293001
December 2014
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
October 2014
MonTueWedThuFriSatSun
29300102030405
06070809101112
13141516171819
20212223242526
27282930310102