Bytescript: a data manipulation software

Feb 7, 2023

Data manipulation is at the core of any artificial intelligence. Any AI must easily consume, validate, compose and transform any kind of data. This is what bytescript is all about.

Read →

11 Comments

Preston

Feb 13, 2023

I’m confused by the exact use case you envision bytescript filling. I’ll use a simple example to help illustrate my point.

Let’s say I have a website that lets users create accounts. Then my client might send the data for a new account as a JSON object to the server. It would look something like this:

{

userName: “myUser”,

password: “ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad”,

email: myemail@email.com

}

When the server receives that JSON object it can create a new entry into a database using that data. It can either store the JSON directly into a document database like MongoDB. Or it can crab the individual fields and use them to populate a SQL query to store in a SQL DB like MySQL.

Now I’m curious, can you describe an analogue of this simple example that uses bytescript? I’m curious where bytescript would be used in this process and what problem it would be solving.

Expand full comment

Reply (2)

Steve Rodrigue

Feb 13, 2023

Bytescript could also be used to create a domain specific (DSL), parse it and extract its commands and then execute these commands on the network.

A bit like how a search language or SQL actually works, but on top of the whole ecosystem (ours + web + web3). We could even include bitcoin payments in the process.

Like I said, bytescript will be used for much more than simple database schemas :)

Expand full comment

Steve Rodrigue

Feb 13, 2023Edited

Hi Preston,

Yes, the example you gave can be easily validated and parsed and its data extracted using current tools.

However, while building our complex AI, we will have different kind of data sources. We would have to fetch data from websites, read and extract data from different file formats, make an AI model parse a programming language (like python) and extract data from it.

We could, for example, use bytescript to validate the data to make sure it contains a python script and make an AI model learn different programming techniques from scripts it is feeded to.

This goes for pretty much every file format that exists and data container that currently exists. We could use bytescript to transpile data from a format into another easily as well, so consumed data lakes could be converted to an SQL format that could be more easily consumed, for example.

Hope this answer your question properly. If it doesn't please let me know and Ill provide further details.

Thanks in advance and welcome to my blog! :)

Expand full comment

Reply (1)

Preston

Feb 13, 2023

Thanks for the quick reply. I am still confused about where bytescript is solving a problem.

Why is it more convenient to transpile a Python script using bytescript in order to store it, rather than just storing the Python script itself?

Also, what makes the conversion of data lake -> bytescript -> SQL easier than a conversion of data lake -> SQL?

Expand full comment

Reply (1)

Steve Rodrigue

Feb 13, 2023

Well, if we want our AI models to understand what a python script is and how it is structured, the network first need to be able to accept submissions from the community and automatically accept or deny submissions by our community.

With bytescript, we will be able to simply create the schema that knows the composition of a valid python script and only accept those as valid scripts.

Then, our models will be able to receive community-based submissions automatically, which the AI will understand as good scripts.. and we'll be able to scale this process for all the data formats we need.

The conversions through bytescript is just quicker to create and can be executed using our interpreter. Bytescript recipes are also easily shared and can be composed from the previous work of other developers.

We'll have an NFT marketplace where developers registers bytescript recipes and other developers can use them in bigger projects extremely easily and be rewarded when a customer buys them. Ill explain it in a future post in the next few days.

Expand full comment

Reply (1)

Preston

Feb 13, 2023

The validation used in bytescript is basically a subset of what is available using regex, right? So in order to test if something is a Python script or not, you’d basically have to write a regular expression that accepts all valid Python scripts and doesn’t accept any non-valid Python scripts?

Expand full comment

Reply (1)

Steve Rodrigue

Feb 13, 2023

No, regex is a string validation/extraction DSL. It works well to validate short strings and extract data from it. Bytescript is a token-based DSL.

The grammar portion of bytescript returns an AST, which is much more verbose than returned data from regex: https://en.wikipedia.org/wiki/Abstract_syntax_tree . It can be used to create custom lexers and parsers.

In bytescript, you can declare a token, which is a series of bytes declared one after another, and re-use your tokens recursively. Bytescript also works using binary or ASCII data. It could be used to parse an ELF file, for example.

Using regex to lex and parse very complex data is really not recommended. Bytescript itself also do not use regex.

However, a POSIX regex DSL could be created using bytescript.

Here's a SO article that gives more information about when NOT to use regex: https://softwareengineering.stackexchange.com/questions/113237/when-you-should-not-use-regular-expressions

Expand full comment

Reply (1)

Preston

Feb 13, 2023

My point isn’t that you should use a regex to validate that a script is a Python script. Quite the opposite, actually.

The point is that bytescript feels redundant. We have Python parsers, and we have languages that can be used to write new parsers. We can also use those same languages to write regex’s, and we can use DB languages to verify schemes before being stored to a database.

In the case of Python scripts as well, the validation that is being offered through bytescript recipes is the less important form of validation. It is structural validation rather than functional validation. If I could pay for a recipe to validate my Python script I’m some way, or if I could write unit tests for my Python script, I’m choosing to write unit tests every time. And the act of running the unit tests will also cover the structural validation that the recipes appear to offer.

Expand full comment

Reply (1)

Continue thread →

Steve's Care

Bytescript: a data manipulation software