Japanese Input on Slackware
I have been looking for the least intrusive way to enter Japanese text on my main Slackware box, without messing with the rest of the system. I am used to running with $LANG set to "en_US", which is the Slackware default. Then using Latin-1/ISO-8859-1 as the encoding in the filesystem and so on.
The goal is enter Japanese text into Mozilla Firefox and (G)Vim, under the Fluxbox window manager.
Fortunately, those two programs are GTK-based and starts SCIM automatically as long as certain environment variables are set. Here is how it's done:
#!/bin/sh export LC_CTYPE='ja_JP.utf8' export GTK_IM_MODULE="scim-bridge" firefox
In that Firefox session, Control + Space can be used to switch between input methods.
XSPF Coverage Dump
Here is an alternative XSPF coverage check program, based on concepts from the previous one. This script will dump all files in a directory structure, color coded by reference from one or more XSPF playlists. This makes it easy to see which files are included in any playlists, and if they are included more than once.
Take a look:
#!/usr/bin/python import xml.dom.minidom import re import os import getopt xspf_files = dict() fs_files = list() def print_usage(progname): print "Usage: %s [options] <directory> <xspf file> ... <xspf file>" % (progname) print "Options:" print " -c No ANSI color coding" print " -n No count prefix" def xspf_parse(playlist_filename, handler): xml_data = xml.dom.minidom.parse(playlist_filename) for playlist in xml_data.getElementsByTagName("playlist"): for tracklist in playlist.getElementsByTagName("trackList"): for track in tracklist.getElementsByTagName("track"): for location in track.getElementsByTagName("location"): data = re.sub("%([0-9a-fA-F]{2})", \ lambda x: chr(int(x.group(1), 16)), \ location.firstChild.data.encode("utf-8")) track_filename = data.decode("utf-8").replace("file://", "") handler(playlist_filename, track_filename) def add_xspf_file(playlist_filename, track_filename): if not track_filename in xspf_files: xspf_files[track_filename] = list() xspf_files[track_filename].append(playlist_filename) if __name__ == "__main__": import sys try: opts, args = getopt.getopt(filter(None, sys.argv[1:]), "hcn", ["help", "no-color", "no-count"]) except getopt.GetoptError as err: print str(err) print_usage(sys.argv[0]) sys.exit(1) if len(args) < 2: print_usage(sys.argv[0]) sys.exit(1) print_color = True print_count = True for o, a in opts: if o in ("-h", "--help"): print_usage(sys.argv[0]) sys.exit(1) elif o in ("-c", "--no-color"): print_color = False elif o in ("-n", "--no-count"): print_count = False for filename in args[1:]: xspf_parse(filename, add_xspf_file) for root, dirs, files in os.walk(args[0]): for filename in files: fs_files.append(os.path.join(root, filename).decode("iso-8859-1")) for fs_file in sorted(fs_files): if fs_file in xspf_files: count = len(xspf_files[fs_file]) if count > 1: if print_count: sys.stdout.write("%d " % (count)) if print_color: sys.stdout.write("\x1B[32;1m") sys.stdout.write(fs_file.encode("iso-8859-1")) if print_color: sys.stdout.write("\x1B[0m") sys.stdout.write("\n") else: if print_count: sys.stdout.write("1 ") if print_color: sys.stdout.write("\x1B[32m") sys.stdout.write(fs_file.encode("iso-8859-1")) if print_color: sys.stdout.write("\x1B[0m") sys.stdout.write("\n") else: if print_count: sys.stdout.write("0 ") sys.stdout.write(fs_file.encode("iso-8859-1")) sys.stdout.write("\n") sys.exit(0)
100ish Posts
Dear anonymous reader,
It has been 8 years, 1 month and 15 days since I made the first post on this website. That's close to 3000 days ago...
I have made one hundred posts during that time. The article ID found on the link for each post is a little off, because a couple of posts were deleted in the early stages. I have decided not to rectify this, since it would break all the search engine results!
It seems that I have made:
* 60 "Scripts and Code" posts, for smaller projects where code is embedded in the post itself.
* 20 "Open Source" posts, for larger projects where the code is downloadable through a link.
* 16 "Configuration" posts, which are technical but not directly related to code.
* 3 "Mundane" posts, which deal with meatspace topics.
* 1 "General" post, which is actually the first post...
I'm still aiming for 1 post per month, and that can hopefully continue. The main disruptions are caused by lengthy business travels to North America or East Asia.
Buildroot for 486
Here are the steps I used in order to get a brand new Linux version 4 kernel running on an old 486 DX computer. The setup is based on the Buildroot system for making embedded systems. The end result is the kernel with a very simple BusyBox based userland.
All the steps are based on the 2015.05 version of Buildroot, gotten from http://buildroot.uclibc.org/downloads/buildroot-2015.05.tar.gz
Get the three necessary configuration files from Here.
First of all copy the configuration files into Buildroot and prepare it:
mkdir board/i486 cp i486_defconfig configs/ cp linux-4.0.4-i486.defconfig board/i486 cp extlinux.conf board/i486 make i486_defconfig
For any further changes to the configuration, use:
make menuconfig make linux-menuconfig
Then build!:
make
I had to use extlinux as a bootloader, since neither syslinux nor GRUB worked correctly in my case. A Compact Flash card was inserted as /dev/sdd, and setup like this:
fdisk /dev/sdd # One primary partition, active, uses all space. sudo cat /usr/share/syslinux/mbr.bin > /dev/sdd mkfs.ext2 /dev/sdd1
Once the CF card is prepared, relative from the Buildroot folder, the files can be copied over:
mount /mnt/sdd1 sudo tar xf ./output/images/rootfs.tar -C /mnt/sdd1 sudo mkdir /mnt/sdd1/boot sudo cp ./output/images/bzImage /mnt/sdd1/boot/ sudo cp ./board/i486/extlinux.conf /mnt/sdd1/boot/ sudo extlinux -i /mnt/sdd1/boot sync umount /mnt/sdd1
The card is then ready to be used.
Katakana to ASCII Converter
The Japanese writing system of Katakana is typically used to represent text from foreign languages. This means it's possible to translate it directly, and still be able to understand some of the meaning.
So, I made this C-based filter to convert UTF-8 based Katakana text to ASCII, take a look:
#include <stdio.h> typedef struct unicode_s { int code; char text[3]; } unicode_t; #define KATAKANA_SIZE 96 static unicode_t katakana[KATAKANA_SIZE] = { {0x30A0, "="}, {0x30A1, "a"}, {0x30A2, "a"}, {0x30A3, "i"}, {0x30A4, "i"}, {0x30A5, "u"}, {0x30A6, "u"}, {0x30A7, "e"}, {0x30A8, "e"}, {0x30A9, "o"}, {0x30AA, "o"}, {0x30AB, "ka"}, {0x30AC, "ga"}, {0x30AD, "ki"}, {0x30AE, "gi"}, {0x30AF, "ku"}, {0x30B0, "gu"}, {0x30B1, "ke"}, {0x30B2, "ge"}, {0x30B3, "ko"}, {0x30B4, "go"}, {0x30B5, "sa"}, {0x30B6, "za"}, {0x30B7, "shi"}, {0x30B8, "ji"}, {0x30B9, "su"}, {0x30BA, "zu"}, {0x30BB, "se"}, {0x30BC, "ze"}, {0x30BD, "so"}, {0x30BE, "zo"}, {0x30BF, "ta"}, {0x30C0, "da"}, {0x30C1, "chi"}, {0x30C2, "di"}, {0x30C3, "tsu"}, {0x30C4, "tsu"}, {0x30C5, "dzu"}, {0x30C6, "te"}, {0x30C7, "de"}, {0x30C8, "to"}, {0x30C9, "do"}, {0x30CA, "na"}, {0x30CB, "ni"}, {0x30CC, "nu"}, {0x30CD, "ne"}, {0x30CE, "no"}, {0x30CF, "ha"}, {0x30D0, "ba"}, {0x30D1, "pa"}, {0x30D2, "hi"}, {0x30D3, "bi"}, {0x30D4, "pi"}, {0x30D5, "fu"}, {0x30D6, "bu"}, {0x30D7, "pu"}, {0x30D8, "he"}, {0x30D9, "be"}, {0x30DA, "pe"}, {0x30DB, "ho"}, {0x30DC, "bo"}, {0x30DD, "po"}, {0x30DE, "ma"}, {0x30DF, "mi"}, {0x30E0, "mu"}, {0x30E1, "me"}, {0x30E2, "mo"}, {0x30E3, "ya"}, {0x30E4, "ya"}, {0x30E5, "yu"}, {0x30E6, "yu"}, {0x30E7, "yo"}, {0x30E8, "yo"}, {0x30E9, "ra"}, {0x30EA, "ri"}, {0x30EB, "ru"}, {0x30EC, "re"}, {0x30ED, "ro"}, {0x30EE, "wa"}, {0x30EF, "wa"}, {0x30F0, "wi"}, {0x30F1, "we"}, {0x30F2, "wo"}, {0x30F3, "n"}, {0x30F4, "vu"}, {0x30F5, "ka"}, {0x30F6, "ke"}, {0x30F7, "va"}, {0x30F8, "vi"}, {0x30F9, "ve"}, {0x30FA, "vo"}, {0x30FB, "."}, {0x30FC, "-"}, {0x30FD, ","}, {0x30FE, ","}, {0x30FF, "|"}, }; static int multibyte_len(unsigned char byte) { if (byte & 0x80) { if (byte & 0x40) { if (byte & 0x20) { if (byte & 0x10) { if (byte & 0x8) { if (byte & 0x4) { if (byte & 0x2) { if (byte & 0x1) { return 8; } else { return 7; } } else { return 6; } } else { return 5; } } else { return 4; } } else { return 3; } } else { return 2; } } else { return 1; } } else { return 0; } } static int multibyte_data(unsigned char byte) { if (byte & 0x80) { if (byte & 0x40) { if (byte & 0x20) { if (byte & 0x10) { if (byte & 0x8) { if (byte & 0x4) { if (byte & 0x2) { if (byte & 0x1) { return -1; } else { return -1; } } else { return byte & 0x1; } } else { return byte & 0x3; } } else { return byte & 0x7; } } else { return byte & 0xf; } } else { return byte & 0x1f; } } else { return byte & 0x3f; } } else { return -1; } } static char *katakana_to_ascii(int unicode) { int i; for (i = 0; i < KATAKANA_SIZE; i++) { if (katakana[i].code == unicode) { return katakana[i].text; } } return "?"; } int main(void) { int c, in_utf8, len, unicode; in_utf8 = 0; while ((c = fgetc(stdin)) != EOF) { if (c & 0x80) { /* If multibyte character... */ if (in_utf8) { unicode = unicode << (7 - multibyte_len(c)); /* Shift existing... */ unicode = unicode | multibyte_data(c); /* ...then add new bits. */ len--; if (len <= 0) { fputs(katakana_to_ascii(unicode), stdout); in_utf8 = 0; } } else { in_utf8 = 1; len = multibyte_len(c) - 1; /* More multibytes to read. */ unicode = multibyte_data(c); } } else { in_utf8 = 0; fputc(c, stdout); } } return 0; }
Shift Encryption Filter
Here is a re-implementation of a couple of programs I made around 10 years ago. It's a standard in/out filter that performs simple shift encryption, of the Caesar or Vigenere variants.
It supports both text and binary mode. Text mode only operates on A to Z, while binary mode operates on the whole 8-bit range of a character byte.
Take a look at the code and compile it:
#include <stdlib.h> #include <stdio.h> #include <unistd.h> typedef enum { CIPHER_NONE, CIPHER_CAESAR, CIPHER_VIGENERE, } cipher_t; typedef enum { MODE_NONE, MODE_TEXT, MODE_BINARY, } mode_t; typedef enum { DIRECTION_NONE, DIRECTION_ENCRYPT, DIRECTION_DECRYPT, } direction_t; static inline int binary_shift(int c, int n) { c += n; if (c > 255) { c -= 256; } return c; } static inline int binary_unshift(int c, int n) { c -= n; if (c < 0) { c += 256; } return c; } static inline int text_shift(int c, int n) { if (c >= 65 && c <= 90) { c += n; if (c > 90) c -= 26; } if (c >= 97 && c <= 122) { c += n; if (c > 122) c -= 26; } return c; } static inline int text_unshift(int c, int n) { if (c >= 65 && c <= 90) { c -= n; if (c < 65) c += 26; } if (c >= 97 && c <= 122) { c -= n; if (c < 97) c += 26; } return c; } static void caesar_filter(int shift_amount, mode_t mode, direction_t direction) { int c; while ((c = fgetc(stdin)) != EOF) { if (mode == MODE_TEXT) { if (direction == DIRECTION_ENCRYPT) { c = text_shift(c, shift_amount); } else { /* DIRECTION_DECRYPT */ c = text_unshift(c, shift_amount); } } else { /* MODE_BINARY */ if (direction == DIRECTION_ENCRYPT) { c = binary_shift(c, shift_amount); } else { /* DIRECTION_DECRYPT */ c = binary_unshift(c, shift_amount); } } fputc(c, stdout); } } static void vigenere_filter(char *key, mode_t mode, direction_t direction) { int c; char *p; p = &key[0]; while ((c = fgetc(stdin)) != EOF) { if (mode == MODE_TEXT) { if (direction == DIRECTION_ENCRYPT) { c = text_shift(c, *p - 97); } else { /* DIRECTION_DECRYPT */ c = text_unshift(c, *p - 97); } } else { /* MODE_BINARY */ if (direction == DIRECTION_ENCRYPT) { c = binary_shift(c, *p - 97); } else { /* DIRECTION_DECRYPT */ c = binary_unshift(c, *p - 97); } } fputc(c, stdout); p++; if (*p == '\0') p = &key[0]; } } static void display_help(char *progname) { fprintf(stderr, "Usage: %s <options>\n", progname); fprintf(stderr, "Options:\n" " -h Display this help and exit.\n" " -c SHIFT Use Caesar cipher, with SHIFT.\n" " -v KEY Use Vigenere cipher, with KEY.\n" " -t Text mode. (Operate on 'a-z' and 'A-Z' only.)\n" " -b Binary mode. (Operate on 0-255 byte range.)\n" " -e Encrypt. (Forward shift.)\n" " -d Decrypt. (Reverse shift.)\n" "\n"); } int main(int argc, char *argv[]) { int c; mode_t mode = MODE_NONE; cipher_t cipher = CIPHER_NONE; direction_t direction = DIRECTION_NONE; char *vigenere_key = NULL; int caesar_shift = 0; while ((c = getopt(argc, argv, "hc:v:tbed")) != -1) { switch (c) { case 'h': display_help(argv[0]); return EXIT_SUCCESS; case 'c': cipher = CIPHER_CAESAR; caesar_shift = atoi(optarg); break; case 'v': cipher = CIPHER_VIGENERE; vigenere_key = optarg; break; case 't': mode = MODE_TEXT; break; case 'b': mode = MODE_BINARY; break; case 'e': direction = DIRECTION_ENCRYPT; break; case 'd': direction = DIRECTION_DECRYPT; break; case '?': default: display_help(argv[0]); return EXIT_FAILURE; } } if (mode == MODE_NONE) { fprintf(stderr, "Error: Specify text or binary mode.\n"); display_help(argv[0]); return EXIT_FAILURE; } if (direction == DIRECTION_NONE) { fprintf(stderr, "Error: Specify encryption or decryption.\n"); display_help(argv[0]); return EXIT_FAILURE; } switch (cipher) { case CIPHER_CAESAR: caesar_filter(caesar_shift, mode, direction); break; case CIPHER_VIGENERE: vigenere_filter(vigenere_key, mode, direction); break; default: fprintf(stderr, "Error: Specify a cipher to use.\n"); display_help(argv[0]); return EXIT_FAILURE; } return EXIT_SUCCESS; }
XSPF Coverage and Duplication Check
Two new XSPF playlist use cases has come to mind. Checking for duplicate file references across playlists and checking for coverage. By coverage, I mean checking if all files within a directory structure is actually referenced by the playlist(s).
Both scripts are based on the XSPF integrity check script I made earlier, and the same parser is used.
Script for duplication check:
#!/usr/bin/python import xml.dom.minidom import re import os.path xspf_files = dict() def xspf_parse(playlist_filename, handler): xml_data = xml.dom.minidom.parse(playlist_filename) for playlist in xml_data.getElementsByTagName("playlist"): for tracklist in playlist.getElementsByTagName("trackList"): for track in tracklist.getElementsByTagName("track"): for location in track.getElementsByTagName("location"): data = re.sub("%([0-9a-fA-F]{2})", \ lambda x: chr(int(x.group(1), 16)), \ location.firstChild.data.encode("utf-8")) track_filename = data.decode("utf-8").replace("file://", "") handler(playlist_filename, track_filename) def file_check(playlist_filename, track_filename): if track_filename in xspf_files: print track_filename, "-->", xspf_files[track_filename], "&", playlist_filename else: xspf_files[track_filename] = playlist_filename if __name__ == "__main__": import sys if len(sys.argv) < 2: print "Usage: %s <xspf file> ... <xspf file>" % (sys.argv[0]) sys.exit(1) for filename in sys.argv[1:]: xspf_parse(filename, file_check) sys.exit(0)
Script for coverage check:
#!/usr/bin/python import xml.dom.minidom import re import os xspf_files = set() fs_files = set() def xspf_parse(playlist_filename, handler): xml_data = xml.dom.minidom.parse(playlist_filename) for playlist in xml_data.getElementsByTagName("playlist"): for tracklist in playlist.getElementsByTagName("trackList"): for track in tracklist.getElementsByTagName("track"): for location in track.getElementsByTagName("location"): data = re.sub("%([0-9a-fA-F]{2})", \ lambda x: chr(int(x.group(1), 16)), \ location.firstChild.data.encode("utf-8")) track_filename = data.decode("utf-8").replace("file://", "") handler(playlist_filename, track_filename) def add_xspf_file(playlist_filename, track_filename): xspf_files.add(track_filename) if __name__ == "__main__": import sys if len(sys.argv) < 3: print "Usage: %s <directory> <xspf file> ... <xspf file>" % (sys.argv[0]) sys.exit(1) for root, dirs, files in os.walk(sys.argv[1]): for filename in files: fs_files.add(os.path.join(root, filename).decode("iso-8859-1")) for filename in sys.argv[2:]: xspf_parse(filename, add_xspf_file) fs_covered = float(len(fs_files.intersection(xspf_files))) fs_total = float(len(fs_files)) print "Coverage: %.2f%%" % ((fs_covered / fs_total) * 100) print "Missing Files:" for filename in fs_files.difference(xspf_files): print filename sys.exit(0)
Filename Sanitizer
Here is a Python script to sanitize filenames that be transferred to a Windows file system. This script will recursively go through a directory and replace the bad characters with underscores. Not sure if this script knows about all the bad ones, but it worked in my case at least.
Check it out:
#!/usr/bin/python import os import sys bad_characters = r'?<>\:*|"' replacement = '_' if len(sys.argv) < 2: print "Usage: %s <directory> [yes]" % (sys.argv[0]) sys.exit(1) do_it = False if len(sys.argv) > 2: if sys.argv[2] == 'yes': do_it = True for directory, dirnames, filenames in os.walk(sys.argv[1], topdown=False): for name in filenames + dirnames: newname = name for bad in bad_characters: newname = newname.replace(bad, replacement) if newname != name: oldpath = os.path.join(directory, name) newpath = os.path.join(directory, newname) print "%s -> %s" % (oldpath, newpath) if do_it: os.rename(oldpath, newpath)
Storage Chart
Here is a small curses program that can visualize disk space usage, among other things. It's meant to be used together with the "du" command like this: "du -s * | this-program"
The program has a few shortcomings, and it could have probably been implemented without the use of curses. Note that Ctrl-C needs to be used to quit the program, since it uses standard in to read the data, and blocks other input from the keyboard. An "other" category is used for excessive input, but this can get too large. In that case, sort the input before passing it to the program like so: "du -s * | sort -n -r | this-program".
The visual format is similar to that of WinDirStat/KDirStat. Here's an example:

Observe the code:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <curses.h> #define TEXT_X_AREA 20 #define TEXT_MAX TEXT_X_AREA - 3 #define DATA_MAX 24 /* Best fit when using 80x24 terminal. */ #define DATA_DELIMITER "\t" typedef struct data_s { double value; char text[TEXT_MAX + 1]; } data_t; static void screen_init(int *max_y, int *max_x) { initscr(); atexit((void *)endwin); if (has_colors()) { start_color(); init_pair(1, COLOR_RED, COLOR_BLACK); init_pair(2, COLOR_GREEN, COLOR_BLACK); init_pair(3, COLOR_BLUE, COLOR_BLACK); init_pair(4, COLOR_YELLOW, COLOR_BLACK); init_pair(5, COLOR_MAGENTA, COLOR_BLACK); init_pair(6, COLOR_CYAN, COLOR_BLACK); init_pair(7, COLOR_WHITE, COLOR_BLACK); } noecho(); getmaxyx(stdscr, *max_y, *max_x); } static void box_draw(int pattern, int start_y, int start_x, int size_y, int size_x) { int y, x; for (y = 0; y < size_y; y++) { for (x = 0; x < size_x; x++) { if ((pattern % 14) > 6) { wattrset(stdscr, A_BOLD | COLOR_PAIR((pattern % 7) + 1)); } else { wattrset(stdscr, COLOR_PAIR((pattern % 7) + 1)); } mvaddch(start_y + y, start_x + x, pattern + 0x41); } } refresh(); } static void text_draw(int pattern, char *text, int y, int x) { if ((pattern % 14) > 6) { wattrset(stdscr, A_BOLD | COLOR_PAIR((pattern % 7) + 1)); } else { wattrset(stdscr, COLOR_PAIR((pattern % 7) + 1)); } mvprintw(y, x, "%c:%s", pattern + 0x41, text); refresh(); } static int data_compare(const void *p1, const void *p2) { return ((data_t *)p1)->value < ((data_t *)p2)->value; } int main(int argc, char *argv[]) { int i, max_y, max_x, start_y, start_x, size_y, size_x, horizontal; double sum, share, used, value; char line[128], *p; data_t data[DATA_MAX]; for (i = 0; i < DATA_MAX; i++) { data[i].value = 0.0; data[i].text[0] = '\0'; } strncpy(data[DATA_MAX - 1].text, "*OTHER*", TEXT_MAX); i = 0; while (fgets(line, sizeof(line), stdin) != NULL) { p = strtok(line, DATA_DELIMITER); if (p == NULL) continue; value = atof(p); p = strtok(NULL, DATA_DELIMITER); if (p == NULL) continue; if (i >= (DATA_MAX - 1)) { data[DATA_MAX - 1].value += value; } else { data[i].value = value; strncpy(data[i].text, p, TEXT_MAX); data[i].text[TEXT_MAX] = '\0'; i++; } } /* Sort to get largest value first. */ qsort(data, DATA_MAX, sizeof(data_t), data_compare); sum = 0.0; for (i = 0; i < DATA_MAX; i++) { if (data[i].value <= 0.0) break; sum += data[i].value; } if (sum <= 0.0) { return 1; /* Will divide by zero, abort. */ } screen_init(&max_y, &max_x); max_x -= TEXT_X_AREA; used = 0.0; horizontal = start_y = start_x = 0; size_y = max_y; size_x = max_x; for (i = 0; i < DATA_MAX; i++) { if (data[i].value <= 0.0) break; share = data[i].value / (sum - used); if (horizontal) { size_x = max_x - start_x; size_y = share * (double)(max_y - start_y); box_draw(i, start_y, start_x, size_y, size_x); start_y += size_y; horizontal = 0; } else { size_y = max_y - start_y; size_x = share * (double)(max_x - start_x); box_draw(i, start_y, start_x, size_y, size_x); start_x += size_x; horizontal = 1; } used += data[i].value; } for (i = 0; i < DATA_MAX; i++) { if (data[i].value <= 0.0) break; text_draw(i, data[i].text, i, max_x + 1); } move(0,0); refresh(); while (1) { sleep(1); /* Ctrl-C to quit! */ } return 0; }