Bytescript: a data manipulation software

Data manipulation is at the core of any artificial intelligence. Any AI must easily consume, validate, compose and transform any kind of data. This is what bytescript is all about.

Feb 07, 2023

General idea

The bytescript software is basically a scripting language that explains the schema of the data that it can take as input. That schema is called a grammar. Once data is combined with a grammar, an abstract syntax tree (AST) is created.

Using the language, we can then fetch elements from the AST in order to extract portion of its data, and add a prefix and suffix to the extracted data. Therefore it gives an easy tool to extract and compose data from an AST.

Each script are called a “recipe”. Each recipe can contain input and output prameters. When executed, the input parameters are provided to the script, using within its scope and the output parameters are returned after its execution.

The following is a detailed explanation of its syntax and features.

End of line, or End of Instruction

The bytescript language does not understand spaces, tabs and carriage line returns. Therefore, the semicolon (;) is used to tell the language that an end of line, and therefore an instruction is completed.

Comments

Comments in the bytescript language always begins with two (2) slashes. It can then contain any value and ends with an End of Instruction’s semicolon (;).

// this is my comment;

Variable name

A variable name can contain either numbers only, or camelCase letters. Here’s two (2) examples of valid variable names:

myName
1

Parameters

A script can contain input and output parameters. Basically, input parameters are the parameters passed to the script from outside its scope. The output parameters are the variables that will be returned out of its scope.

Input Parameter

The input parameter begins with an arrow composed of an hyphen (-) and a greater than (>) sign. Then it contains the variable name and ends with the End of Instruction’s semicolon (;).

-> myInput;

Output Parameter

The output parameter begins with an arrow composed of an hyphen (-) and a smaller than (<) sign. Then it contains the variable name and ends with the End of Instruction’s semicolon (;).

<- myOutput;

Constant

The constant construct provides a way to compose byte values one after another.

Entry point

The entry point of a constant begins with the ‘@’ character, followed byt its token name.

@myToken;

Token

A token is basically a specific byte value that is assigned to a name, or elements that consist of previously declared tokens.

Byte Value

Each byte is a value between 0 and 255. They can be assigned to a token using a colon (:). Each token can then be re-used as elements in order to compose a more-complex token.

Element

Each element is, by default, produced only once. If the token contains a byte value, it will return that value once. If a token contains a series of bytes, it will, by default, return that series of bytes once.

To duplicate a given element, use the open bracket ({) character, then write the amount of times you want to duplicate that variable, then close it using the close bracket (}).

myElement{25}

Example

When executed, that constant would return these bytes: [21, 68, 65, 79, 79, 79, 79, 79] which are the byte representation of the ASCII characters: !heyyyyy

// This is entry point:
@exclHey;

// this is the byte's representation of: !heyyyyy
exclHey: exclamationPoint lH lE lY{5};

// those are individual byte tokens:
exclamationPoint: 21;
lH: 68;
lE: 65;
lY: 79;

Grammar

The grammar construct contains the acceptable schema of a representation of data. The syntax is very similar as the one of the constant, but contains possibilities that are acceptable to be matched against data, instead of composing data itself.

Entry point

The entry point of a grammar begins with the ‘@’ character, followed byt its token name.

@myToken;

Channels

Channels are the characters that will be omitted while creating the AST. For example, if you don’t want to care in the AST output of a grammar’s execution on data, simply put the spaces in channels.

The channels begins with an hyphen, followed by its token name.

Channels can also contain previous and/or next tokens, but they are optional. If provided, the tokens that matches a channel will be omitted only if the previous/next characters matches the previous/next token.

-mySpaceChanToken;
-mySpace [myPrev];
-mySpace [,myNext];
-mySpace [myPrev, myNext];

Token

A token is basically a specific byte value that is assigned to a name, or elements that consist of previously declared tokens.

Element

Each element is, by default, requested ony once. If the token contains a byte value, it will require that value once in order to match. If a token contains a series of bytes, it will, by default, require that series of bytes once.

To change the cardinality of an element, you must use these keyword:

‘?’ will match if the element is present or not, once (single-optional)
‘*’ will match if the element is present 0+ times (multi-optional)
‘+’ will match if the element is present 1+ times (multi-mandatory)
[2] will match if the element is present exactly two (2) times
[2,] will match if the element is present two (2) or more times
[,3] will match if the element is present at most three (3) times
[2,4] will match if the element is present between two (2) and four (4) times

Token possibilities

Each token contains a list of possibilities. Each possibility is separated using the pipe (|) character.

Therefore, this token declaration:

myToken: firstElement secondElement
       | secondElement
       ;

… matches exactly the same as this one:

myToken: firstElement? secondElement;

Token types

Byte Value

Each byte is a value between 0 and 255. They can be assigned to a token using a colon (:). Each token can then be re-used as elements in order to compose a more-complex token.

Therefore, a token could contain a single byte

myToken: 45;

Constant

To understand how a constant is formed, please refer to the previous section.

Everything except

Everything except matches only when the opposite of the token matches. To create an everything except token, simply use the dash (#) character. If you want to use an escape token, simply add it using the exclamation point (!) after the everything except declaration.

// everything except '!', with '$' as escape character:
first: #exclamationPoint !dollarSign;

// everything except '!':
second: #exclamationPoint;

// byte value:
exclamationPoint: 21;
dollarSign: 24;

Therefore, the first token would match this data:

this is some data ;! and it continues

But the second token would not match it.

Adding tests to a token

At the end of each token, tests can be provided. Each test consists of providing a list of constants that is valid or invalid to the token. The tests that are valid must matches and the tests that are invalid must NOT match.

The tests suite begins by its delimiter, which consists of three (3) hyphens.

myToken: myFirst mySecond+ myThird[2,3]
       | myFirst myThird[2,4]
       ---
       valid: myValidConstant;
       invalid : myInvalidConstant
               & mySecondInvalidConstant
	     ;
       ;

In the previous example, the test suite validates that the constant ‘myValidConstant‘ must match the token, while these two (2) constants must NOT match it: ‘myInvalidConstant‘ and ‘mySecondInvalidConstant‘.

Query

Assignments

The three (3) previously explained constructs can be assigned to a variable. The construct’s data must be surrounded by open and close brackets ({}).

// this is a grammar declaration:
myGrammar = {
    @myRoot;
    ...
};

// this is a constant declaration:
myConstant = {
    @myRoot;
    ...
};

// this is a query declaration:
myQuery = {
    ...
};

Program

A program is basically the recipe that composes the parameter declarations, the construct declarations (constants, grammars and queries) and their executions.

Executions

Each of the three (3) constructs explained above can be executed. Each one of them will return its own type of data. To execute a construct, simply use the plus (+) before its variable.

Constant Execution

The constant execution do not need any parameter. The execution will return the data (bytes) that composes the constant.

// declare the parameters:
<- data;

// declare the constant:
myConstant = {...};

// execute the constant and return the data:
data = +myConstant;

Grammar Execution

The grammar execution needs to receive data (bytes) as input. The execution will return an AST that contains the data tokenized using the grammar’s token names.

// declare the parameters:
-> data;
<- ast;

// declare the grammar:
myGrammar = {...};

// execute the grammar on the input data and return the AST:
ast = +myGrammar data;

Query Execution

The query execution needs to receive an AST as input. The execution will return data (bytes) fetched from the AST and composed as declared in the query construct.

// declare the parameters:
-> input;
<- output;

// declare the grammar:
myGrammar = {...};

// execute the grammar on the input data and return the AST:
myAST = +myGrammar data;

// delare the query:
myQuery = {...};

// execute the query on the ast and return the output:
output = +myQuery myAST;

Program Execution

The program execution can (optional) receive variables as input. The execution will return the output variables as declared by its parameters.

Please note that there is always a matching process for input and output variables when there is multiple variables passed or returned.

// declare the parameters:
-> first;
-> second;
<- third;
<- fourth;

// declare the sub-program:
myProgram = {
    -> firstIn;
    -> secIn;
    <- outFirst;
    <- outSec;
};

// execute the program:
(
  third: outFirst, 
  fourth: outSec
) = +myProgram (
  first: firstIn, 
  second: secIn
)

Type casting

Each of the four (4) constructs explained above (constant, grammar, query and program) can be easily converted to data (bytes). To cast the constructs to bytes, simply use the pipe (|) keyword.

// declare the output parameter:
-> output;

// declare a grammar construct:
myGrammar = {...};

// convert the grammar to bytes and return it:
output = |myGrammar;

What’s next

To make people work in synergy with this software, we need a way for people to…

Request recipes from other people
Compose someone’s else recipe into our own
Query a database to discover recipes of others
Request a recipe to be created by someone else
Let someone host recipes of others and deliver them when requested
Pay for recipes
Receive payment from customers that buy recipes
Refer customers and/or contributors to the ecosystem and receive payment
Automatically pay people that work in synergy without human interference
Moderate the ecosystem

My next blog post will explain how this ecosystem will work. It uses various blockchain databases and uses the concept of non-fungible tokens (NFT), fungible tokens and decentralized autonomous organizations (DAO).

In the mean time, if you have questions regarding bytescript, please post a comment here and Ill answer your question/concern and update the article if needed.

Stay tuned!

EDIT: Here's the description of the different actors the bytescript ecosystem will be composed of

Steve's Care

Discussion about this post