Skip to content

Commit

Permalink
The flow.record project
Browse files Browse the repository at this point in the history
  • Loading branch information
fox-srt authored and pyrco committed Jul 20, 2022
0 parents commit 0f20ddf
Show file tree
Hide file tree
Showing 62 changed files with 8,840 additions and 0 deletions.
7 changes: 7 additions & 0 deletions .github/workflows/dissect-ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
name: Dissect CI
on: [push, pull_request, workflow_dispatch]

jobs:
ci:
uses: fox-it/dissect-workflow-templates/.github/workflows/dissect-ci-template-self-hosted.yml@main
secrets: inherit
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
coverage.xml
.coverage
dist/
.eggs/
*.egg-info/
*.pyc
__pycache__/
.pytest_cache/
.tox/

flow/record/version.py
5 changes: 5 additions & 0 deletions COPYRIGHT
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Dissect is released as open source by Fox-IT (https://www.fox-it.com) part of NCC Group Plc (https://www.nccgroup.com)

Developed by the Dissect Team (dissect@fox-it.com) and made available at https://github.com/fox-it/flow.record

License terms: AGPL3 (https://www.gnu.org/licenses/agpl-3.0.html)
661 changes: 661 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
exclude .gitignore
exclude .github
105 changes: 105 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# flow.record

A library for defining and creating structured data (called records) that can be streamed to disk or piped to other
tools that use `flow.record`.

Records can be read and transformed to other formats by using output adapters, such as CSV and JSON.

For more information on how Dissect uses this library, please see [the
documentation](https://dissect.readthedocs.io/en/latest/tools/rdump.html#what-is-a-record).

## Usage

This library contains the tool `rdump`. With `rdump` you can read, write, interact, and manipulate records from `stdin`
or from record files saved on disk. Please refer to `rdump -h` or to the [`rdump`
documentation](https://dissect.readthedocs.io/en/latest/tools/rdump.html) for all parameters.

Records are the primary output type when using the various functions of `target-query`. The following command shows how
to pipe record output from `target-query` to `rdump`:

```shell
user@dissect~$ target-query -f runkeys targets/EXAMPLE.vmx | rdump
<windows/registry/run hostname='EXAMPLE' domain='EXAMPLE.local' ts=2022-12-09 12:06:20.037806+00:00 name='OneDriveSetup' path='C:/Windows/SysWOW64/OneDriveSetup.exe /thfirstsetup' key='HKEY_CURRENT_USER\\Software\\Microsoft\\Windows\\CurrentVersion\\Run' hive_filepath='C:\\Windows/ServiceProfiles/LocalService/ntuser.dat' username='LocalService' user_sid='S-1-5-19' user_home='%systemroot%\\ServiceProfiles\\LocalService'>
<...>
```

## Programming example

Define a `RecordDescriptor` (schema) and then create a few records and write them to disk

```python
from flow.record import RecordDescriptor, RecordWriter

# define our descriptor
MyRecord = RecordDescriptor("my/record", [
("net.ipaddress", "ip"),
("string", "description"),
])

# define some records
records = [
MyRecord("1.1.1.1", "cloudflare dns"),
MyRecord("8.8.8.8", "google dns"),
]

# write the records to disk
with RecordWriter("output.records.gz") as writer:
for record in records:
writer.write(record)
```

The records can then be read from disk using the `rdump` tool or by instantiating a `RecordReader` when using the
library.

```shell
$ rdump output.records.gz
<my/record ip=net.ipaddress('1.1.1.1') description='cloudflare dns'>
<my/record ip=net.ipaddress('8.8.8.8') description='google dns'>
```

### Selectors

We can also use `selectors` for filtering and selecting records using a query (Python like syntax), e.g.:

```shell
$ rdump output.records.gz -s '"google" in r.description'
<my/record ip=net.ipaddress('8.8.8.8') description='google dns'>

$ rdump output.records.gz -s 'r.ip in net.ipnetwork("1.1.0.0/16")'
<my/record ip=net.ipaddress('1.1.1.1') description='cloudflare dns'>
```

## Build and test instructions

This project uses `tox` to build source and wheel distributions. Run the following command from the root folder to build
these:

```bash
tox -e build
```

The build artifacts can be found in the `dist/` directory.

`tox` is also used to run linting and unit tests in a self-contained environment. To run both linting and unit tests
using the default installed Python version, run:

```bash
tox
```

For a more elaborate explanation on how to build and test the project, please see [the
documentation](https://dissect.readthedocs.io/en/latest/contributing/developing.html#building-testing).

## Contributing

The Dissect project encourages any contribution to the codebase. To make your contribution fit into the project, please
refer to [the style guide](https://dissect.readthedocs.io/en/latest/contributing/style-guide.html).

## Copyright and license

Dissect is released as open source by Fox-IT (<https://www.fox-it.com>) part of NCC Group Plc
(<https://www.nccgroup.com>).

Developed by the Dissect Team (<dissect@fox-it.com>) and made available at <https://github.com/fox-it/dissect>.

License terms: AGPL3 (<https://www.gnu.org/licenses/agpl-3.0.html>). For more information, see the LICENSE file.
108 changes: 108 additions & 0 deletions examples/filesystem.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
import os
import stat

from datetime import datetime

from flow.record import RecordDescriptor, RecordWriter

FilesystemFile = RecordDescriptor("""
filesystem/unix/entry
string path;
varint inode;
varint dev;
unix_file_mode mode;
filesize size;
uint32 uid;
uint32 gid;
datetime ctime;
datetime mtime;
datetime atime;
string link;
""")


def hash_file(path, t):
f = open(path, "rb")
while 1:
d = f.read(4096)
if d == "":
break
f.close()


class FilesystemIterator:
basepath = None

def __init__(self, basepath):
self.basepath = basepath
self.recordType = FilesystemFile

def classify(self, source, classification):
self.recordType = FilesystemFile.base(_source=source, _classification=classification)

def iter(self, path):
path = os.path.abspath(path)
return self._iter(path)

def _iter(self, path):
if path.startswith("/proc"):
return

st = os.lstat(path)

abspath = path
if self.basepath and abspath.startswith(self.basepath):
abspath = abspath[len(self.basepath):]

ifmt = stat.S_IFMT(st.st_mode)

link = None
if ifmt == stat.S_IFLNK:
link = os.readlink(path)

yield self.recordType(
path=abspath,
inode=int(st.st_ino),
dev=int(st.st_dev),
mode=st.st_mode,
size=st.st_size,
uid=st.st_uid,
gid=st.st_gid,
ctime=datetime.fromtimestamp(st.st_ctime),
mtime=datetime.fromtimestamp(st.st_mtime),
atime=datetime.fromtimestamp(st.st_atime),
link=link,
)

if ifmt == stat.S_IFDIR:
for i in os.listdir(path):
if i in (".", ".."):
continue

fullpath = os.path.join(path, i)
for e in self.iter(fullpath):
yield e

chunk = []


if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('target', metavar="TARGET", nargs="*")
parser.add_argument('-s', dest='source', help="Source")
parser.add_argument('-c', dest='classification', help="Classification")
parser.add_argument('-b', dest='base', help="Base directory")

args = parser.parse_args()

stream = RecordWriter()

fsiter = FilesystemIterator(args.base)

if args.source or args.classification:
fsiter.classify(args.source, args.classification)

for path in args.target:
for r in fsiter.iter(path):
stream.write(r)
71 changes: 71 additions & 0 deletions examples/passivedns.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#!/usr/bin/env pypy
import record
import sys
import datetime

import net.ipv4

from fileprocessing import DirectoryProcessor


def ts(s):
return datetime.datetime.fromtimestamp(float(s))


def ip(s):
return net.ipv4.Address(s)


class SeparatedFile:
fp = None
seperator = None
format = None

def __init__(self, fp, seperator, format):
self.fp = fp
self.seperator = seperator
self.format = format

def __iter__(self):
desc = record.RecordDescriptor([i[0] for i in PASSIVEDNS_FORMAT])
recordtype = desc.recordType

for l in self.fp:
p = l.strip().split(self.seperator)

r = {}
for i in range(len(self.format)):
field = self.format[i]

v = p[i]
if field[1]:
v = field[1](v)

r[field[0]] = v

yield recordtype(**r)


def PassiveDnsFile(fp):
return SeparatedFile(fp, "||", PASSIVEDNS_FORMAT)

PASSIVEDNS_FORMAT = [
("ts", ts),
("src", ip),
("dst", ip),
("family", None),
("query", None),
("query_type", None),
("result", None),
("ttl", int),
("x", None),
]


def main():
rs = record.RecordOutput(sys.stdout)
for r in DirectoryProcessor(sys.argv[1], PassiveDnsFile, r"\.log\.gz"):
rs.write(r)

if __name__ == "__main__":
main()
2 changes: 2 additions & 0 deletions examples/records.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
{"_type": "recorddescriptor", "_data": ["text/paste", [["string", "key"], ["datetime", "date"], ["datetime", "expire_date"], ["wstring", "title"], ["wstring", "content"], ["wstring", "user"], ["wstring", "syntax"]]]}
{"_classification": "PUBLIC", "_generated": "2019-03-19T09:11:04.706581", "_source": "external/pastebin", "_type": "record", "_recorddescriptor": ["text/paste", 831446724], "_version": 1, "content": "This is the content of a sampe pastebin record", "date": "2019-03-19T09:09:47", "expire_date": "1970-01-01T00:00:00", "key": "Q42eWSaF", "syntax": "text", "title": "A sample pastebin record", "user": ""}
43 changes: 43 additions & 0 deletions examples/tcpconn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import random

from datetime import datetime
from flow import record

conn = record.RecordDescriptor("""
network/traffic/tcp/connection
datetime ts;
net.ipv4.Address src;
net.tcp.Port srcport;
net.ipv4.Address dst;
net.tcp.Port dstport;
""")

ip_list = [
"127.0.0.1",
"1.2.3.4",
"212.33.1.45",
"4.4.4.4",
"8.8.8.8",
"212.1.6.1",
]

port_list = [
22,
53,
80,
443,
5555
]

rs = record.RecordWriter()

for i in range(500):
r = conn(
ts=datetime.now(),
src=random.choice(ip_list),
srcport=random.choice(port_list),
dst=random.choice(ip_list),
dstport=random.choice(port_list)
)

rs.write(r)
Loading

0 comments on commit 0f20ddf

Please sign in to comment.