Discard Manual Scheduling With DolphinScheduler 3.1.x Cluster Deployment

1. Preface

For Apache DolphinScheduler cluster deployment, the author has summarized a document that can be followed directly from start to finish, facilitating subsequent operations such as deployment, upgrade, adding nodes, and reducing nodes.

2. Preparations

2.1. Basic Components

JDK: Download JDK (1.8+ link), install, and configure the JAVA_HOME environment variable. Append the bin directory to the PATH environment variable. Skip if JDK is already installed.
Binary Package: Download DolphinScheduler binary package from here.
Database: PostgreSQL (8.2.15+ link) or MySQL (5.7+). Choose either. For MySQL, JDBC Driver 8 version is required, which can be downloaded from the central repository.
Registry Center: ZooKeeper (3.4.6+). Download from here.
Process Tree Analysis
macOS: Install pstree.
Fedora/Red/Hat/CentOS/Ubuntu/Debian: Install psmisc.

Note: DolphinScheduler does not depend on Hadoop, Hive, Spark, etc., but if your tasks require them, corresponding environment support is needed.

3. Upload

Upload the binary package and extract it to a directory. Specify the directory location as per your preference.

Pay attention to directory names; it’s advisable to add some characters to differentiate between the installation directory and the directory where the binary package is extracted. For example:

tar -xvf apache-dolphinscheduler-3.1.7-bin.tar.gz
mv apache-dolphinscheduler-3.1.7-bin dolphinscheduler-3.1.7-origin

The ‘-origin’ suffix indicates the original extracted binary package. When there are configuration changes later, you can modify the files in this directory and then re-execute the installation script.

4. User configurations

4.1. Configure User Permissions and Passwordless Access

Create a deployment user and ensure to configure sudo passwordless access. For example:

# Create user (requires root login)
useradd dolphinscheduler

# Set password
echo “dolphinscheduler” | passwd –stdin dolphinscheduler

# Configure sudo passwordless access
sed -i ‘$a dolphinscheduler ALL=(ALL) NOPASSWD: ALL’ /etc/sudoers
sed -i ‘s/Defaults requirett/#Defaults requirett/g’ /etc/sudoers

# Modify directory permissions to grant deployment user access to the extracted apache-dolphinscheduler-*-bin directory
chown -R dolphinscheduler:dolphinscheduler apache-dolphinscheduler-*-bin

Note:

Deployment user needs sudo privileges for task execution services, and it should be passwordless. Beginners can ignore this for now.
If “/etc/sudoers” contains “Defaults requirett”, comment it out.

4.2. Configure SSH Passwordless Login for Machines

SSH passwordless login is required for resource transfer between different machines. Follow these steps to configure it:

su dolphinscheduler

ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# Execute the following command; otherwise, passwordless login will fail
chmod 600 ~/.ssh/authorized_keys

Note: After configuration, you can test by running ssh localhost to check if login without password is successful.

5. Start ZooKeeper

Simply start ZooKeeper in the cluster.

6 Modify Configuration

All the following operations should be executed under the dolphinscheduler user.

After preparing the basic environment, modify the configuration files based on your machine environment. Configuration files can be found in the bin/env directory, namely install_env.sh and dolphinscheduler_env.sh.

6.1 install_env.sh

The install_env.sh file configures where DolphinScheduler will be installed on which machines, and which services will be installed on each machine. You can find this file in the bin/env/ directory, then follow the instructions below to modify the corresponding configurations.

# ———————————————————
# INSTALL MACHINE
# ———————————————————
# A comma separated list of machine hostname or IP would be installed DolphinScheduler,
# including master, worker, api, alert. If you want to deploy in pseudo-distributed
# mode, just write a pseudo-distributed hostname
# Example for hostnames: ips=”ds1,ds2,ds3,ds4,ds5″, Example for IPs: ips=”192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5″
# Configure the machines where DolphinScheduler will be installed.
ips=${ips:-“ds01,ds02,ds03,hadoop02,hadoop03,hadoop04,hadoop05,hadoop06,hadoop07,hadoop08”}

# Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine
# modify it if you use different ssh port
sshPort=${sshPort:-“22”}

# A comma separated list of machine hostname or IP would be installed Master server, it
# must be a subset of configuration `ips`.
# Example for hostnames: masters=”ds1,ds2″, Example for IPs: masters=”192.168.8.1,192.168.8.2″
# Configure the machines where the Master server will be installed.
masters=${masters:-“ds01,ds02,ds03,hadoop04,hadoop05,hadoop06,hadoop07,hadoop08”}

# A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a
# subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts
# Example for hostnames: workers=”ds1:default,ds2:default,ds3:default”, Example for IPs: workers=”192.168.8.1:default,192.168.8.2:default,192.168.8.3:default”
# To configure which machines the Worker role will be installed on, you need to specify a comma-separated list of machine hostnames or IP addresses along with their corresponding worker groups in the `workers` variable. By default, all workers are placed in the `default` worker group. Additional worker groups can be configured individually through the DolphinScheduler interface.
workers=${workers:-“ds01:default,ds02:default,ds03:default,hadoop02:default,hadoop03:default,hadoop04:default,hadoop05:default,hadoop06:default,hadoop07:default,hadoop08:default”}

# A comma separated list of machine hostname or IP would be installed Alert server, it
# must be a subset of configuration `ips`.
# Example for hostname: alertServer=”ds3″, Example for IP: alertServer=”192.168.8.3″
# To configure which machine the Alert role will be installed on, specify a single machine
alertServer=${alertServer:-“hadoop03”}

# A comma separated list of machine hostname or IP would be installed API server, it
# must be a subset of configuration `ips`.
# Example for hostname: apiServers=”ds1″, Example for IP: apiServers=”192.168.8.1″
# To configure which machine the Alert role will be installed on, specify a single machine
apiServers=${apiServers:-“hadoop04”}

# The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists.
# Do not set this configuration same as the current path (pwd). Do not add quotes to it if you using related path.
# Installation path configuration: It will be installed on all machines in the Dolphin cluster. Make sure to differentiate it from the directory where the binary package is extracted. It’s preferable to include the version number for easier upgrade operations later.
installPath=${installPath:-“/opt/dolphinscheduler-3.1.5”}

# The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh`
# script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs
# to be created by this user
# Deployment user: Use the user created above for deployment.
deployUser=${deployUser:-“dolphinscheduler”}

# The root of zookeeper, for now DolphinScheduler default registry server is zookeeper.
# Configure the name registered to the ZooKeeper znode. If multiple DolphinScheduler clusters are configured, different names need to be configured.
zkRoot=${zkRoot:-“/dolphinscheduler”}

6.2. dolphinscheduler_env.sh

You can find this file at the path bin/env/. It is used to configure some environment settings. Modify the corresponding configurations according to the following instructions:

# JDK path, must be modified
export JAVA_HOME=${JAVA_HOME:-/usr/java/jdk1.8.0_202}

# Database type, supports mysql, postgresql
export DATABASE=${DATABASE:-mysql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
# Connection URL, mainly modify the hostname below, and the last configuration is for the East Eight Zone
export SPRING_DATASOURCE_URL=”jdbc:mysql://hostname:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai”
export SPRING_DATASOURCE_USERNAME=dolphinscheduler
# If the password is complex, it needs to be enclosed in single quotes before and after
export SPRING_DATASOURCE_PASSWORD=’xxxxxxxxxxxxx’

export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}
# Configure the time zone used when JVM starts for each role. Default is -UTC, if you want to fully support the East Eight Zone, set it to -GMT+8
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-GMT+8}
export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}

export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
# Configure the zookeeper address used
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-hadoop01:2181,hadoop02:2181,hadoop03:2181}

# Configure some environment variables used according to your needs, install all required components by yourself
export HADOOP_HOME=${HADOOP_HOME:-/opt/cloudera/parcels/CDH/lib/hadoop}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}
export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1}
export SPARK_HOME2=${SPARK_HOME2:-/opt/spark-3.3.2}
export PYTHON_HOME=${PYTHON_HOME:-/opt/python-3.9.16}
export HIVE_HOME=${HIVE_HOME:-/opt/cloudera/parcels/CDH/lib/hive}
export FLINK_HOME=${FLINK_HOME:-/opt/flink-1.15.3}
export DATAX_HOME=${DATAX_HOME:-/opt/datax}
export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/seatunnel-2.1.3}
export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun}

export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PY

6.3. common.properties

Download the hdfs-site.xml and core-site.xml files from your Hadoop cluster and place them in the api-server/conf/ and worker-server/conf/ directories. If you have set up an Apache native cluster, retrieve these files from the respective component’s conf directory. For CDH, you can directly download them from the CDH interface.

Modify these files located in the api-server/conf/ and worker-server/conf/ directories. These files mainly configure parameters related to resource uploads, such as uploading DolphinScheduler’s resources to HDFS. Follow the instructions below to make the necessary modifications:

# Local path, mainly used to store temporary files during task execution. Ensure that the user has read and write permissions for this directory. Generally, keep the default. If you encounter permission errors during task execution indicating insufficient permissions for files in this directory, simply change the directory permissions to 777.
data.basedir.path=/tmp/dolphinscheduler

# Resource view suffixes
#resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js

# Location to save resources, possible values: HDFS, S3, OSS, NONE
resource.storage.type=HDFS
# Base path for resource uploads, must start with /dolphinscheduler, ensure that the user has read and write permissions for this directory
resource.storage.upload.base.path=/dolphinscheduler

# The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.access.key.id=minioadmin
# The AWS secret access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.secret.access.key=minioadmin
# The AWS Region to use. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.region=cn-north-1
# The name of the bucket. You need to create them by yourself. Otherwise, the system cannot start. All buckets in Amazon S3 share a single namespace; ensure the bucket is given a unique name.
resource.aws.s3.bucket.name=dolphinscheduler
# You need to set this parameter when private cloud s3. If S3 uses public cloud, you only need to set resource.aws.region or set to the endpoint of a public cloud such as S3.cn-north-1.amazonaws.com.cn
resource.aws.s3.endpoint=http://localhost:9000

# alibaba cloud access key id, required if you set resource.storage.type=OSS
resource.alibaba.cloud.access.key.id=<your-access-key-id>
# alibaba cloud access key secret, required if you set resource.storage.type=OSS
resource.alibaba.cloud.access.key.secret=<your-access-key-secret>
# alibaba cloud region, required if you set resource.storage.type=OSS
resource.alibaba.cloud.region=cn-hangzhou
# oss bucket name, required if you set resource.storage.type=OSS
resource.alibaba.cloud.oss.bucket.name=dolphinscheduler
# oss bucket endpoint, required if you set resource.storage.type=OSS
resource.alibaba.cloud.oss.endpoint=https://oss-cn-hangzhou.aliyuncs.com

# if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
resource.hdfs.root.user=hdfs
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
#
resource.hdfs.fs.defaultFS=hdfs://bigdata:8020

# whether to startup kerberos
hadoop.security.authentication.startup.state=false

# java.security.krb5.conf path
java.security.krb5.conf.path=/opt/krb5.conf

# login user from keytab username
login.user.keytab.username=hdfs-mycluster@ESZ.COM

# login user from keytab path
login.user.keytab.path=/opt/hdfs.headless.keytab

# kerberos expire time, the unit is hour
kerberos.expire.time=2

# resourcemanager port, the default value is 8088 if not specified
resource.manager.httpaddress.port=8088
# if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty
yarn.resourcemanager.ha.rm.ids=hadoop02,hadoop03
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s
# job history status url when application number threshold is reached(default 10000, maybe it was set to 1000)
yarn.job.history.status.address=http://hadoop02:19888/ws/v1/history/mapreduce/jobs/%s

# datasource encryption enable
datasource.encryption.enable=false

# datasource encryption salt
datasource.encryption.salt=!@#$%^&*

# data quality option
data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar

#data-quality.error.output.path=/tmp/data-quality-error-data

# Network IP gets priority, default inner outer

# Whether hive SQL is executed in the same session
support.hive.oneSession=false

# use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions; if set false, executing user is the deploy user and doesn’t need sudo permissions
sudo.enable=true
setTaskDirToTenant.enable=false

# network interface preferred like eth0, default: empty
#dolphin.scheduler.network.interface.preferred=

# network IP gets priority, default: inner outer
#dolphin.scheduler.network.priority.strategy=default

# system env path
#dolphinscheduler.env.path=dolphinscheduler_env.sh

# development state
development.state=false

# rpc port
alert.rpc.port=50052

# set path of conda.sh
conda.path=/opt/anaconda3/etc/profile.d/conda.sh

# Task resource limit state
task.resource.limit.state=false

# mlflow task plugin preset repository
ml.mlflow.preset_repository=https://github.com/apache/dolphinscheduler-mlflow
# mlflow task plugin preset repository version
ml.mlflow.preset_repository_version=”main”

6.4 application.yaml

You need to modify the /conf/application.yaml file for all roles, including: master-server/conf/application.yaml, worker-server/conf/application.yaml, api-server/conf/application.yaml, and alert-server/conf/application.yaml. The main modification is to set the time zone. Here’s the specific modification:

spring:
banner:
charset: UTF-8
jackson:
# Set the time zone to GMT+8, modify only this section
time-zone: GMT+8
date-format: “yyyy-MM-dd HH:mm:ss”

6.5. service.57a50399.js和service.57a50399.js.gz

You’ll find these two files, service.57a50399.js and service.57a50399.js.gz, in the api-server/ui/assets/ and ui/assets/ directories, respectively.

Navigate to each of these directories and locate the mentioned files. Then, open them using the vim command. Once opened, search for 15e3 and change it to 15e5. This modification adjusts the timeout for page responses. The default value 15e3 represents 15 seconds, and we’re changing it to 1500 seconds. This change ensures that there won’t be errors due to page timeouts when uploading large files.

7 Initialize the database

To initialize the database, follow these steps:

Driver Configuration:
Copy the MySQL driver (8.x) to the lib directory of each DolphinScheduler role, including:

api-server/libs

alert-server/libs

master-server/libs

worker-server/libs

tools/libs

Database User:

create database `dolphinscheduler` character set utf8mb4 collate utf8mb4_general_ci;
create user ‘dolphinscheduler’@’%’ IDENTIFIED WITH mysql_native_password by ‘your_password’;
grant ALL PRIVILEGES ON dolphinscheduler.* to ‘dolphinscheduler’@’%’;
flush privileges;

Execute Database Upgrade Script:
Run the following command to execute the database upgrade script:

bash tools/bin/upgrade-schema.sh

8. Installation:

Run the installation script:

bash ./bin/install.sh

This script will remotely transfer all local files to the machines configured in the above configuration files using scp. It will then stop the corresponding roles on each machine and start them again.

After the first installation, all roles will be started automatically. There’s no need to start any roles separately. If any roles are not started, you can check the corresponding logs on the respective machines to identify the specific issues.

9. Start and stop the services

Stop all services:

bash ./bin/stop-all.sh

Start all services:

bash ./bin/start-all.sh

Start/Stop Master:

bash ./bin/dolphinscheduler-daemon.sh stop master-server
bash ./bin/dolphinscheduler-daemon.sh start master-server

Start/Stop Worker:

bash ./bin/dolphinscheduler-daemon.sh start worker-server
bash ./bin/dolphinscheduler-daemon.sh stop worker-server

Start/Stop Api:

bash ./bin/dolphinscheduler-daemon.sh start api-server
bash ./bin/dolphinscheduler-daemon.sh stop api-server

Start/Stop Alert:

bash ./bin/dolphinscheduler-daemon.sh start alert-server
bash ./bin/dolphinscheduler-daemon.sh stop alert-server

It’s crucial to note that you must execute these scripts using the user who installed DolphinScheduler to avoid permission issues.

Each service has a dolphinscheduler_env.sh file in the <service>/conf/ directory, which provides convenience for microservice requirements. This means you can configure <service>/conf/dolphinscheduler_env.sh for the corresponding service and then start each service based on different environment variables using <service>/bin/start.sh command. However, if you start the server using the command /bin/dolphinscheduler-daemon.sh start <service>, it will override <service>/conf/dolphinscheduler_env.sh with the file bin/env/dolphinscheduler_env.sh and then start the service. This is done to reduce the cost of users modifying configurations.

10. Scaling Out

10.1. Standard Method

Refer to the steps above and follow these operations:

New Node
– Install and configure JDK.
– Create a new user for DolphinScheduler (Linux user) and configure passwordless login and permissions.

On the machine where DolphinScheduler was previously installed and the binary package was uncompressed.
Log in as the user who installed DolphinScheduler.
Modify the entire directory previously configured in the configuration file bin/env/install_env.sh and specify which roles need to be deployed on the new node.
Execute the /bin/install.sh script for installation. This script will retransmit the entire directory to all machines configured in bin/env/install_env.sh, then stop all roles on all machines, and finally restart all roles.

Disadvantages of this method: If DolphinScheduler has many tasks running at the minute level or real-time tasks such as Flink or Spark, stopping all roles and restarting them will take some time. During this period, tasks may stop abnormally due to the restart of the entire cluster or may not be scheduled normally. However, DolphinScheduler implements automatic fault tolerance and disaster recovery functions, so this operation is feasible. Finally, observe whether all tasks are executed normally.

10.2. Simple Method

Refer to the steps above and follow these operations:

New Node

Install and configure JDK.
Create a new user for DolphinScheduler (Linux user) and configure passwordless login and permissions.

On the machine where DolphinScheduler was previously installed and the binary package was uncompressed.
Log in as the user who installed DolphinScheduler.
Compress the entire directory previously configured, then transfer it to the new node.

New Node

Uncompress the files on the new node and rename them to the installation directory configured in the configuration file bin/env/install_env.sh.
Log in as the user who installed DolphinScheduler.
Start the roles that need to be deployed on the new node. The specific script location is /bin/dolphinscheduler-daemon.sh, and the start command is:

./dolphinscheduler-daemon.sh start master-server
./dolphinscheduler-daemon.sh start worker-server

Log in to the DolphinScheduler interface and observe in the “Monitor Center” whether the corresponding roles have started on the new node.

11. Scaling In

Stop all roles on the machine to be removed using the /bin/dolphinscheduler-daemon.sh script. The stop command is:

./dolphinscheduler-daemon.sh stop worker-server

Log in to the DolphinScheduler interface and observe in the “Monitor Center” whether the roles stopped on the machine have disappeared.
On the machine where you previously installed Dolphin Scheduler by extracting the binary installation package:
Log in as the user who installed Dolphin Scheduler.
Modify the configuration file bin/env/install_env.sh. In this configuration file, remove the machines corresponding to the offline roles.

12. Upgrade

Follow the steps above step by step. For operations that have been performed before, there is no need to perform them again. Below are some specific operation steps:

Upload the new version binary package.
Uncompress it to a directory different from the old version installation directory, or rename it.
Modify the configuration files. A simpler way is to copy all the configuration files involved in the previous installation directory to the new version directory and replace them.
Package all the components deployed on other nodes, then unpack and place them in the corresponding positions of the new node. To find out which components need to be copied, you can refer to the configuration in dolphinscheduler_env.sh file.
Configure the drivers, referring to the steps in “Initializing the Database”.
Stop the previous cluster.
Backup the entire database.
Execute the database upgrade script, referring to the steps in “Initializing the Database”.
Execute the installation script, referring to “Installation”.
After the upgrade is complete, log in to the interface and check the “Monitor Center” to see if all roles have started successfully.