cat articles/website

How secon.dev was implemented, December 2022 edition

created 2022-12-11

It has been a while since I wrote the 2020 edition of how secon.dev was implemented, and I have been thinking that it might be time to touch secon.dev again. I may only think about it and never actually do it, but I will write down the state as of late 2022.

Core implementation

This part is almost unchanged from the 2020 version. I write files in Markdown, sync them through Dropbox, and a build server detects changes, builds static HTML with Next.js, and deploys it to Firebase Hosting.

For entries other than diaries, the site shows related entries at the bottom of the page. This article should probably show them too. The approach is the same as in I made a CLI that outputs similar documents for static site generators: calculate TF-IDF for the text, use cosine similarity to find similar articles, and have Next.js read that data and include it at build time. It is a very plain mechanism.

Related entries work reasonably well for non-diary articles. Diaries, though, tend to contain many unrelated notes instead of a single topic. Treating a whole diary entry as one document and computing similarity from TF-IDF word occurrence does not work very well, so I do not use it there.

Articles with similar images

When an article contains a photo, the site shows articles that contain similar photos at the bottom of the individual article page. I use this navigation a lot myself. It is one of my favorite features.

During the image upload flow described below, I extract metadata and, at the same time, image features. At the moment I use EfficientNetB0. I then use those features to compute cosine similarity and pick similar images. This is also a plain mechanism, and it is almost the same as what I described in the similar image search article.

At the time I did not really understand image features. I still cannot say I understand them properly, but I probably know more than I did then. So I simply chose EfficientNetB0 because it was both performant and small. If I were choosing now, I would start from what kind of "similar image" would actually be useful and choose the model from there.

Diaries from the same date

This is an old feature in web diary systems. It lets you look back at what happened on the same date in previous years, and it becomes very good navigation when you are writing "diaries" rather than "articles". The more content accumulates, the more useful it becomes.

Image upload

In the 2020 version I used Hatena Fotolife as the image upload destination. Later, because Hatena Fotolife removed its paid upload option, I changed the storage to GCS.

When I upload an image file to a specific GCS bucket, a Cloud Function converts JPEG to WebP, resizes it to frequently used sizes, and extracts metadata. The image feature extraction mentioned above also happens there.

Uploading files to a specific GCS bucket sounds troublesome at first, but I mount the bucket as a Windows local filesystem using the method described in Mount a GCS bucket as a Windows filesystem. That means I can develop a photo in Lightroom, save it, and quickly get the various image sizes needed by the site.

Because all files are on GCS, it is also convenient when I want to bring photos back to my local machine and do something with them. I can fetch them with something like gsutil -m rsync .... I am glad I got around to building this setup.

Cost

secon.dev does not get a lot of traffic, so Firebase Hosting for the website, GCS for image hosting, and Cloud Functions together cost 49 yen including tax for November 2022.

In reality there is also the build server cost, since it runs on a VPS that I use for many other things, so the total cost is probably a little higher.

Future implementation direction

secon.dev is currently a static build, and the data is filesystem-based: Markdown text plus JSON metadata. This approach has become inefficient enough that I now want a database where I can build data while keeping references between pieces of information, whether with GraphQL, an RDB, or something else. It is not strictly necessary, though, so I am still not sure what to do.

For machine learning features, if more things could be calculated dynamically, I could do more with the site, such as building my own search. I am also thinking about that area.

I am still interested in machine learning these days, and I would like to combine experiments in that area with secon.dev. I will probably choose technologies and architecture that fit that direction.