Skip to content

在SeaTunnel,Config文件非常重要,用户可以最大化地定制他们的数据同步方案。所以,接下来,我们将介绍如何配置Config文件。

Config文件最重要的格式是hocon,更多介绍可以参考HOCON-GUIDE。同时,SeaTunnel还支持json格式,但是config文件命名需要以.json结尾。

hocon格式

sh
env {
  job.mode = "BATCH"
}

source {
  FakeSource {
    result_table_name = "fake"
    row.num = 100
    schema = {
      fields {
        name = "string"
        age = "int"
        card = "int"
      }
    }
  }
}

transform {
  Filter {
    source_table_name = "fake"
    result_table_name = "fake1"
    fields = [name, card]
  }
}

sink {
  Clickhouse {
    host = "clickhouse:8123"
    database = "default"
    table = "seatunnel_console"
    fields = ["name", "card"]
    username = "default"
    password = ""
    source_table_name = "fake1"
  }
}

json格式

sh

{
  "env": {
    "job.mode": "batch"
  },
  "source": [
    {
      "plugin_name": "FakeSource",
      "result_table_name": "fake",
      "row.num": 100,
      "schema": {
        "fields": {
          "name": "string",
          "age": "int",
          "card": "int"
        }
      }
    }
  ],
  "transform": [
    {
      "plugin_name": "Filter",
      "source_table_name": "fake",
      "result_table_name": "fake1",
      "fields": ["name", "card"]
    }
  ],
  "sink": [
    {
      "plugin_name": "Clickhouse",
      "host": "clickhouse:8123",
      "database": "default",
      "table": "seatunnel_console",
      "fields": ["name", "card"],
      "username": "default",
      "password": "",
      "source_table_name": "fake1"
    }
  ]
}

Env

环境配置

  • job.name

任务名

  • jars

使用三方jars包,比如:jars="file://local/jar1.jar;file://local/jar2.jar"

  • job.mode

指定任务是批模式还是流模式,job.mode = "BATCH"为批模式,job.mode = "STREAMING"为流模式

  • checkpoint.interval

  • parallelism

并发数

  • shade.identifier

Source

定义SeaTunnel从哪里获取数据。支持同时配置多个源,每个源都有特有的参数用于定义如何获取数据。

此处以FakeSource数据源为例:

  • FakeSource

虚拟数据源,可以根据用户定义的数据结构随机生成数据,仅用于测试场景。

SeaTunnel中文文档