Log streaming agent by Treasure Data.
- Key Points
- Ports
- Treasure Data distribution
- Log locations
- Web UI
- Fluentular
- Fluent-cat
- Monitoring
- High Availability
- Config
- Fluent Bit
- versatile log collector written in C + Ruby
- 500+ plugins
- better performance and smaller memory footprint than LogStash
- treats logs in JSON
- scales (largest user 50,000+ servers)
- Ruby 2.1 + ruby-dev to build gem, or install rpm from Treasure Data repo
- spools buffer to disk - data loss only if disk is lost or proc dies in between receive + write to disk (flush interval)
- lot of new params in 0.14 eg.
@type tail
- Fluentd => MongoDB tried by many users, doesn't scale well
- often used with Elasticsearch and Kibana - called the EFK stack for each technology
- can also be used with Prometheus
- often used in Kubernetes, cloud-native and microservices environments
DockerHub image:
fluent/fluentd
Default username / password: admin
/ changeme
Port | Service | Type |
---|---|---|
9292 | WebUI | fluent-ui |
9880 | HTTP API | @type http |
24220 | Rest API | @type monitor-agent |
24224 | TCP forward | @type forward |
24230 | Debug port | @type debug_agent |
nmap -p 24224 -sU <host>
td-agent
/etc/td-agent/td-agent.conf
service td-agent
Validate config:
fluentd --dry-run -c fluent.conf
Capture Fluentd's own logs + send to logserver:
<match fluentd.**>
...
/var/log/fluentd
/var/log/td-agent
Management & Regex Tester
Port 9292
fluent-ui start
or for Treasure Data distribution:
td-agent-ui start
http://fluentular.herokuapp.com
Fluentd Regex Tester
- runs on Heroku free tier so might be unavailable some of the time, retry later
fluent-cat [ -h 127.0.0.1 -p 24224 ] [ -u [ -s /var/run/fluent/fluent.sock] ]
echo '{"message": "hello"}' | fluentd-cat <mytag>
interact with debug port if enabled - see Monitoring section below for enabling @type debug_agent
.
fluent-debug
Rest API
Per plugin stats
/api/plugins.json
<source>
@type monitor_agent
bind 0.0.0.0
port 24220
</source>
fluent-debug command connects here
<source>
@type debug_agent
bind 127.0.0.1
port 24230
</source>
<match **>
@type forward
# primary
<server>
host x.x.x.x
port 24224
</server>
# backup
<server>
host y.y.y.y
port 24224
</server>
# reduces CPU usage large flush interval
# <buffer>
# flush_interval 60s
# </buffer>
</match>
Processes events top down as defined in config file.
@plugin
- all plugins are prefixed with @
symbol
Component | Description |
---|---|
source | input, tags everything on inbound |
match | matches tag => sends to output plugin |
filter | pass / reject / modify some events based on @type or tag, can be used to add fields, or 'out_relabel' plugin to add new tag |
system | system-wide config |
label | groups outputs + filters for internal routing - @label @blah in |
@include | include other files in lexical order use multiple explicit @include to enforce ordering @include config./*.conf |
tag | dot separated components |
* | match single event name component (dot delimited) |
** | match zero or more event name components - use to match all logs or prefixed logs |
global settings
<system>
log_level info
suppress_repeated_stackstace
process_name myname # ps will show 'worker:myname' and 'supervisor:myname'
@type
- which input plugin to usehttp
- http listenerforward
- tcp listener- receives events from
24224/tcp
- receives events from
This is used by log forwarding and the fluent-cat
command:
<source>
@type forward
port 24224
@label @blah # processing continues in <label @blah> block
</source>
Send event with tag my.app.tag
:
http://this.host:9880/my.app.tag?json={"event":"data"}
<source>
@type http
port 9880
</source>
<filter my.app.tag>
@type record_transformer
# add host_param field with Ruby hostname
<record>
host_param "#{Socket.gethostname}"
</record>
</filter>
Matches in config file order, put more specific at top, otherwise they'll be consumed by more generic matches
Match events with tag my.app.tag
+ send to output plugin file
:
<match my.app.tag>
@type file
path /var/log/fluent/myapp.log
</match>
Match any of multiple whitespace separated tags:
<match tag1 tag2>
Only events with <source> @label @blah
go in to this block:
<label @blah>
<filter myprefix.**>
@type grep
...
</filter>
<match **>
@type s3
...
</match>
</label>
<source>
@type tail
@id yarn_resourcemanager_input
format multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2}/
format1 /^(?<my_event_time>[^ ]* [^ ]*) (<?level>[^ ]*) (?<class>[^:]*) (?<message>.*)$/
time_key my_event_time
keep_time_key true
path /opt/mapr/hadoop/hadoop-*/logs/yarn-*-resourcemanager-*.log
tag resourcemanager
pos_file /opt/mapr/fluentd/fluentd-0.14.00/var/log/fluentd/tmp/resourcemanager.pos
</source>
<store>
@type remote_syslog
host x.x.x.x
port 51400
severity debug
tag fluentd
</store>
Common Params:
Attribute | Description |
---|---|
@type |
Specifies the plugin |
@id |
plugin_id field shown in monitor_agent Rest API |
@label |
|
@log_level |
Per plugin log level |
fluent-gem
and td-agent-gem
are wrappers around gem
command.
Use these otherwise won't be able to find gems via OS or RVM commands.
fluent-gem install fluent-plugin-grep
or for Treasure Data distribution:
tg-agent-gem install fluent-plugin-grep
Pin specific version for prod to avoid regressions:
fluent-gem install fluent-plugin-elasticsearch -v 2.4.1
fluentd -p /path/to/plugin -p /path/to/plugin2
Auto-loads plugins from:
/etc/fluentd/plugin
or
/etc/td-agent/plugin
/etc/fluentd/Gemfile
:
source 'http://rubygems.org'
gem 'fluentd-pluing-elasticsearch', '2.4.1'
gem ...
##
fluentd --gemfile /etc/fluentd/Gemfile
fluendtd -S
Update plugins after updating Ruby as Ruby doesn't guarantee C extension compatibility between major versions so plugins using C might break.
@type
:
- tail
- forward (tcp)
- tcp (delimited, default: \n)
- udp
- http - POST
- syslog
- exec - interval or long running program - fluentd expects tsv, msgpack or json stdout
- dummy - generates events for testing
- windows_eventlog
Set in input <sources>
-
extract fields from log
<key>
(eg. message) -
Fluent UI has Regex tester
-
regexp -
(?<fieldname>regex)
- set fieldname totime
to reset time of event from content, configurable bytime_key
-
apache2 - shows regex used in doc
-
apache-error
-
nginx
-
syslog - socket text protocol parser
-
LTSV - Labelled Tab Separated Values blah:val\tblah2:val2...
-
CSV - provide array of keys as headers, delimiter
-
TSV
-
json - oj / yajl / json parser types (default: oj)
-
multiline - multi-line version of regex parser
- uses tstamp regex as delimiter + sends logs when next log is generated
format_firstline /^\d{4}-\d{1,2}-\d{1,2}
format1 /line1_regex/
# capture different line formats that are part of the multi-line logsformat2 /line2_regex/
...
-
grok - github by fluent to allow grok patterns + pattern files
Strongly recommended to be set in any output match plugin to write when destination servers are unavailable or buffers are full:
<secondary>
out_secondary_file /path/to/file
Plugin | Description |
---|---|
out_copy |
Sends to 2 or more locations |
out_exec |
TSV stdin |
out_webhdfs |
Hourly - to output, appends every 10 secs (hdfs-site.xml - need to set dfs.support.append = true , dfs.support.broken.append = true ) |
out_s3 |
|
out_forward |
<secondary> is recommended in case other servers are unavailable |
out_roundrobin |
|
out_stdout |
For debugging, stdout in foreground / logs in daemon mode |
out_exec_filter |
|
out_file |
|
out_mongo |
|
out_mongo_replset |
|
out_none |
Stores message in 'message' field to defer parsing for later |
out_null |
|
out_relabel |
Does nothing, just allows @label @blah to be added |
- AWS
- GCP
- NoSQL
- Search - Elasticsearch - user + password + ssl verify options for X-Pack authentication
Set in each output plugin:
Format | Description |
---|---|
file |
<tstamp>\t<tag>\t<json> |
json |
|
ltsv |
|
csv |
|
msgpack |
MessagePack binary encoding, more efficient than JSON |
hash |
Dictionary |
single_value |
Output a single field, e.g., message , instead of all fields |
stdout |
Time tag: <json_msg> |
Filter | Description |
---|---|
filter_record_transformer |
Add/mutate tags/message using Ruby #{code_snippet} |
filter_record_modifier |
Faster subset of common use cases |
filter_group |
Filters only messages where key fields match regex |
filter_parse |
Parses given key field and adds fields to message's JSON (like LogStash grok) |
filter_geoip |
Adds fields using Maxmind GeoLite2 free geoip database (free but less accurate than their commercial version, uses geoip-devel rpm / libgeoip-dev deb) |
filter_stdout |
Debugging |
- file - dependent on the filesystem, don't use remote NFS, HDFS, GlusterFS etc.
- memory:
<match **>
buffer_type file
# buffer paths must have separate non-overlapping prefixes to not interfere with each other
# as they are recursively collected - see doc for clarity
buffer_path /var/log/fluent/myapp.*.buffer
# * is replaced by random chars, could use ${tag} but must be unique between all fluentd instances
DockerHub:
fluent/fluent-bit
- fast lightweight log collector / forwarder (like Beats)
td-agent-bit
rpm / deb- simpler ini config style
Fluentd | Fluentbit |
---|---|
C & Ruby | C |
~ 40MB | ~ 450KB |
Ruby gem, requires other gems | Zero dependencies (unless some special plugin requires them) |
650+ plugins | 35+ plugins |
Ported from private Knowledge Base page 2017