awkium: an awk interpreter


Japanese - Top - How to Use - awkium language - libraries - awkium File System - awkium shell - awkium sed - download

Features of awkium is based of original awk.


awkium

awkium scans a line and apply the given action to the line.
A program of awkium consist of elements shown as follows.

program := program-element program
program-element := pattern action | function variable-name([argument ...]) { statement-list }


patterns

Patterns of awkium are shown as follows.
BEGIN
apply before reading files.

END
apply after reading files.

/regular expression/
apply when an input line matches of the regular experssion.

awkium expression
apply when the expression is true (not 0).

pattern1&&pattern2
apply when both pattern1pattern2 are true.

pattern1||pattern2
apply when either of the given patterns are true.

!pattern1
apply when pattern1 is false (0).

(pattern1)
changes priority of pattern1.

pattern1,pattern2
apply lines from a line matches pattern1 to a line matches pattern2.


awkium expression

Operators of awkium are shown as follows.
operator associative description
function () { ... } create an anonymous function (awkium extension).
() call a function.
[n] refer an element of an array.
$ refer a field.
++ -- inclement/declement a variable.
^ right calculate ab
unary+ unary- ! left calculate unary+, unary-, or logical negation
* / % left calculate multiplication, division, or remainder
+ - left calculate addition or subtraction
(concatenation) left concatenate strings
> < >= <= == != left compare each values
~ !~ left check a left value (does not) matches a right regular expression
in - check a left value is contained in an array.
&& left calculate logical product
|| left calculate logical sum
? : apply second value if first value is true, otherwise third value
= += -= *= /= %= ^= right assign a left value to a right variable


awkium statement

Statements of awkium are shown as follows.
{ statement, ... }
create a block of statements.

if(expression) statement1 [else statement2]
apply statement1 if expression is true, otherwise statement2.

while(expression) statement
apply statement while expression is true.

do statement while(expression)
apply statement while expression is true. At least statement is applied once.

for([exprssion1]; [expression2]; [expression3]) statement
First, expression1 is applied. Next, statement is applied while expression2 is true. expression3 is applied after application of statement, or continue is applied in statement3.

for(variable in array) statement
apply statement in each element of array. A element of array is assigned by variable.

continue
skip to reinitialization of the innermost enclosing loop.

break
leave the innermost enclosing loop.

next
skip to next actions.

exit expression
exit awkium. error code can be specified by expression.

return expression
return from an awkium function. return value can be specified by expression.

print [expression, ...]
print the given expressions, or current line if the expressions are not specified.
Pipes and redirection are available.
> filenamewrite to the file
>> filenameappend to the file
| command print to the given command (pipe)

printf format, [expression, ...]
print the given expression using format.
Pipes and redirection are available.

getline variable
read a line and assign to variable, or $0 if variable is not specified.
Pipes and redirection are available.
getline < filenameread from the file
command | getline read from the standard input of command (pipe).
Return codes are shown as follows.
1normal
0end-of-file
-1error

system(command)
execute command and returns the error code.

close(file)
close file.


built-in functions

Built-in functions of awkium are shown as follows.
gsub(regexp, substitute, [variable])
replace the all strings which match regexp to substitute. $0 will be used if variable is not specified.
The character "&" can be used as a matched string.

sub(regexp, substitite, [variable])
replace the first string which match regexp to substitute. $0 will be used if variable is not specified.
The character "&" can be used as a matched string.

index(dest, string)
return the index at which string is appeared in dest. The first character of a string is charcter number 1.

length(string)
return the length of string.

match(dest, regexp)
return the index at which regexp first matched dest. RSTART will be assigned to first matched index, and RLENGTH will be assigned to the length of this match.

split(string, variable[, regexp])
split string into variable by regexp or FS if regexp is not specified.

sprintf(format, [expression, ...])
format the given expressions by format.

substr(string, index, length)
get a length-characters-long substring of string from index. The first character of a string is charcter number 1.

toupper(string)
convert string to upper case.

tolower(string)
convert string to lower case.

systime()
return the current time as number of seconds since the UNIX epoch.

strftime([format, [time]])
format time by the strftime(3) format. Available formats are shown as follows.
%a abbreviated weekday name
%A full weekday name
%b abbreviated month name
%B full month name
%c time representation of the locale
%d the day of the month(01~31)
%H the hour(00~23)
%I the hour(01~12)
%j the day of the year(001~366)
%m the month(01~12)
%M the minute(00~59)
%p the AM/PM designation
%S the second(00~59)
%U the week of the year(the first Sunday as the first day of week one: 00~53)
%w the weekday(Sunday is 0: 0~6)
%W the week of the year(the first Monday as the first day of week one: 00~53)
%x the date representation of the locale
%X the time representation of the locale
%y the year modulo 100(00~99)
%Y the year
%Z the name of timezone of the locale
%% character '%'
%D identical with %m/%d/%y
%e the month
%h identical with %b
%n newline
%r identical with %I:%M:%S %p
%R identical with %H:%M
%T identical with %H:%M:%S
%t tab character
%k the hour(0~23)
%l the hour(1~12)
%C the century
%u the weekday(Monday is 1: 1~7)
%z +HHMM formatted timezone
%f the abbreviated era of the Japanese imperial calendar (H: Heisei, S:Showa, T:Taisho, M:Meiji).
%F the full era of the Japanese imperial calendar.
%E the year of the era of the Japanese imperial calendar.


Virtual files

Virtual files available in awkium are shown as follows. Filename is the same regardless of OS.
/dev/stdin
represent standard input

0(number)
represent standard input

/dev/stdout
represent standard output

1(number)
represent standard output

/dev/stderr
represent standard error output

2(number)
represent standard error output

/dev/null
discard any output


namespaces/pseudo-objective oriented programming

awkium supports namespaces. Namespaces can avoid conflict of variable name.
Duplicating a namespace, awkium supports prototype-based object oriented programming.
namespace1.namespace2.new() is available to create an instance (duplication of a namespace).
namespace1.namespace2.name
refer the variable name in the namespace namespace1.namespace2.

function namespace1.namespace2.name ([args, ...]) { ... }
define the function name in the namespace namespace1.namespace2.

function namespace1.namespace2.new ([args, ...]) { ... }
define the "constructor" in the namespace namespace1.namespace2.

instance.name
refer the variable name in the instance instance.


First class function object

awkium supports first class function object.
function ([args, ...]) { ... }
You can an anonymous function by using function statement in expression. This corresponds to lambda special form of Lisp.
In anonymous function, the function has the namespace defined it as a closure.
For example,
BEGIN {
  aaa = function() {
    a = 0
    return function() {
      return a++
    }
  }
  bbb = aaa()
  print bbb()
  print bbb()
  print a
}
displays
0
1
(empty line)
.

@function-name
refer a namespace of functions. This corresponds to #' of Common Lisp.


operation of arrays

Operations of arrays are available in awkium.
Results of operation @ by array a and b are shown as follows.
  1. a[i] @ b[i] if the common index on array a and b
  2. a[i] @ (undefined) if the index which is in array a
  3. (undefined) @ b[i] if the index which is in array b
If you operates an array a and scalar b, each element of a will operate scalar b.
If you operates an scalar a and array b, a operates each element of b.

Example: when
a[1] = 1, a[2] = 2, a[3] = 3
b[2] = 3, b[3] = 4, b[4] = 5
x = 2
,
c = a + b → c[1] = 1, c[2] = 5, c[3] = 7, c[4] = 5
c = a - b → c[1] = 1, c[2] = -1, c[3] = -1, c[4] = -5
c = a * z → c[1] = 2, c[2] = 4, c[3] = 6
. The results shows an array of awkium treat as a element of "vector space".



Example: when
a[1] = 1, a[2] = 2, a[3] = 3
b[2] = 3, b[3] = 1, b[4] = 5
,
c = a < b → a[1] = 0, a[2] = 1, a[3] = 0, a[4] = 1
c = a > b → a[1] = 1, a[2] = 0, a[3] = 1, a[4] = 0
.


fixed field

Fixed field data are available by value "fixed" assigns to the variable SEPMODE. Lengths of each fixed field can specify using the variable FS as follows.
FS = "[  ...]"

To specify empty string to SEPMODE you can specify the usual field separation.

Example: If the following script
BEGIN {
	SEPMODE = "fixed"
	FS = "7 6 5"
}
{ print $1, $2, $3 }
is written, awkium displays the following input
123456712345612345
aaaaaaabbbbbbccccc
.
1234567 123456 12345
aaaaaaa bbbbbb ccccc


fixed record

Fixed records are available by value "fixed" assigns to the variable RMODE.
Width of each records can be specified by assigning to the variable RS.

Example: If the following script
BEGIN {
	RMODE = "fixed"
	RS = "10"
}
{ print }
is written, awkium displays the following input
12345678900987654321[eof]
.
1234567890
0987654321


binary record

Binary records are available by value "binary" assigns to the variable RMODE.
Width of each records can be specified by assigning to the variable RS.

Example: read the filename of the tar archive.
BEGIN {
	RMODE = "binary"
	RS = "512"
	SEPMODE = "fixed"
	FS = "100 8 8 8 12 12 8 1 100 5 3 32 32 8 8 12 12"
}
$10 == "ustar" {
	print b.tostring($1)
}
Warning: This program does not work with every tar archives.


property field

Property records like Java property file are available by value "proptery" assigns to the variable RMODE.
The field separator can be specified by assigning to the variable FS.

Example1: If the following script
BEGIN {
	SEPMODE = "property"
	FS = "="
}
{ print $1, $2 }
is written, awkium displays the following input
bbbb=cccc
aaaa=bbbb=cccc
.
bbbb cccc
aaaa bbbb=cccc
Example2: If the following script
BEGIN {
	SEPMODE = "property"
	FS = "=+"
}
{ print $1, $2 }
is written, awkium displays the following input
bbbb=cccc
aaaa===bbbb=cccc
.
bbbb cccc
aaaa bbbb=cccc
Example2: If the following script
BEGIN {
	SEPMODE = "property"
	FS = " "
}
{ print $1, $2 }
is written, awkium displays the following input
bbbb^^Icccc
aaaa^^I^bbbb^^cccc
(^ represents a space and ^I represents a tab.).
bbbb^cccc
aaaa^bbbb^^cccc