Convert Gemini Text Format to HTML

Introduction

For my blog herrdoering.de I have so far used Hugo as a tool to turn Markdown files into HTML. Actually, I would have wished for a web server that makes this translation process unnecessary, so it would be best to have a web browser that directly understands Markdown.

Something similar is now available with the Gemini text format and the matching browser Lagrange:

🌐 Lagrange desktop GUI client for browsing Geminispace
🌐 Wikipedia Entry about the Gemini Space
🌐 Awesome Gemini sampling (Englisch)

Gemini as base format

Since the articles I write usually don't contain many pictures anyway, I decided to convert my blog to the Gemini format and then, as long as the normal web still exists, I will generate HTML pages from it. ;-)

While searching for a suitable converter, I came across the Gemini site of Omar, who has written a small awk script for this purpose. On his site there is also a similar program referenced by dracometallium:

🌐 Omar's Converter
🌐 dracometallicum's gmi2html.awk

I have now assembled something from both programs that works for me and also developed a shell wrapper for it.

How it works

The way it works is that you go into the directory that contains the Gemini text files and the wrapper then creates an output directory one level above called html that contains the generated HTML pages that are also linked to each other. The last time I did anything with awk was a good 20 years ago. So there is certainly room for improvement.

Shell Wrapper gmi2html

#!/bin/bash

htmldir="../html"

for file in `find . -type f -name "*.gmi"`
do
   barefile=`basename -s .gmi $file`
   outfile="$htmldir/$file"
   outdir=`dirname $outfile`
   mkdir -p $outdir
   outfile="$outdir/$barefile.html"

   title=$(cat $file|grep -m 1 '^# '|sed 's/#//'|awk '{$1=$1;print}')
   gem2html.awk $file | sed "s/TITLEEE/$title/" > $outfile
done

Omar's adapted gem2html.awk Converter from Gemini Text Format to HTML

#!/usr/bin/awk -f

BEGIN {
    print "<!DOCTYPE html>"
    print "<html>"
    print "<head>"
    print "    <meta charset=\"utf-8\">"
    print "    <meta name=\"referrer\" content=\"no-referrer\">"
    print "    <style>"
    print "        body{margin: auto; padding: 1em; max-width:40em; font-size: 150%;font-family: sans-serif;}"
    print "        pre {white-space:pre-wrap; background-color:#eee; margin-top:1em; margin-bottom:0; padding:0.5em; }"
    print "        hr  {color:#eee; background-color:#eee; border:#eee; height:4px; }"
    print "    </style>"
    print "    <title>TITLEEE</title>"
    print "</head>"
    print "<body>"

    in_pre = 0;
    in_list = 0;
}

!in_pre && /^```/ {
    in_pre = 1;
    if (in_list) {
       in_list = 0;
       print("</ul>");
    }
    print "<pre>";
    next
}
in_pre && /^```/    { in_pre = 0; print "</pre>"; next }
in_pre       { print san($0); next }

/^###/  { output("<h3>", substr($0, 4), "</h3>"); next }
/^##/   { output("<h2>", substr($0, 3), "</h2>"); next }
/^#/    { output("<h1>", substr($0, 2), "</h1>"); next }
/^>/    { output("<blockquote>", substr($0, 2), "</blockquote>"); next }
/^\*/   { output("<li>", substr($0, 2), "</li>"); next }
/^=>/   {
    $0 = substr($0, 3);
    link = $1;
    $1 = "";
    output_link(link, $0);
    next;
}
//  { output("<p>", $0, "</p>"); next }

END {
    if (in_list)
       print "</ul>"
    if (in_pre)
       print "</pre>"
    print "</body>\n</html>"
}

function trim(s) {
    sub("^[ \t]*", "", s);
    return s;
}

function san(s) {
    gsub("&", "\\&amp;", s)
    gsub("<", "\\&lt;", s)
    gsub(">", "\\&gt;", s)
    return s;
}

function output(ot, content, ct) {
    content = trim(content);

    if (!in_list && ot == "<li>") {
       in_list = 1;
       print "<ul>";
    }

    if (in_list && ot != "<li>") {
       in_list = 0;
       print "</ul>";
    }

    if (ot == "<p>" && content == "")
       return;

    printf("%s%s%s\n", ot, san(content), ct);
}

function output_link(link, content) {
    if (in_list) {
       in_list = 0;
       print "</ul>";
    }

   # If it's a local gemini file, link to the html:
    if((link !~ /^[a-zA-Z]*:\/\//) && (link ~ /\.gmi$/)){
        sub(/\.gmi$/, ".html", link)
    }


    if (content == "")
       content = link;

    printf("&#x2BA9; <a href=\"%s\">%s</a><br>\n", link, trim(san(content)));
}

Conclusion

Until I have converted my entire blog, it will certainly take a while. What I'm also still missing are keywords and categorizations. Hugo does a great job of that, of course.

🌐 The scripts are licensed under the MIT license.

Back
Data Privacy