The Flink training requires a bit of setup which involves downloading software binaries, Maven dependencies, and test data. Because we all know how reliable and well-performing conference WiFi tends to be, we kindly ask you to do the setup for the training in advance.
If you successfully complete the setup instructions beforehand, the training starts for you at 9:30 am.
If you have trouble with the setup or do not have a chance to prepare in advance, we have an optional setup session starting at 9:00 am.
So go ahead and earn yourself 30 minutes of sleep by running the following steps.
1. Software requirements
Flink supports Linux, OS X, and Windows as development environments for Flink programs and local execution. The following software is r equired for a Flink development setup and should be installed on your system.
- Java JDK 7 (or higher)
- Apache Maven 3.x
- Git
- an IDE for Java (and/or Scala) development (follow these instructions (https://ci.apache.org/projects/flink/flink-docs-release-1.1/internals/ide_setup.html) to set up IntelliJ IDEA or Eclipse)
In previous trainings we had the best experiences with UNIX-based setups. If your main operating system is Windows, we recommend to setup a virtual machine running Linux.
2. Prepare a Flink Maven project
Flink provides Maven archetypes to correctly setup Maven projects for Java or Scala Flink programs. Follow the next steps to set up a Flink Maven quickstart project which can be used for the programming exercises.
Generate a Flink Maven project for the exercises
Run one of following commands to generate a Flink Java or Scala project.
Flink Java Project
mvn archetype:generate \
-DarchetypeGroupId=org.apache.flink \
-DarchetypeArtifactId=flink-quickstart-java \
-DarchetypeVersion=1.1.2 \
-DgroupId=org.apache.flink.quickstart \
-DartifactId=flink-java-project \
-Dversion=0.1 \
-Dpackage=org.apache.flink.quickstart \
-DinteractiveMode=false
Flink Scala project
mvn archetype:generate \
-DarchetypeGroupId=org.apache.flink \
-DarchetypeArtifactId=flink-quickstart-scala\
-DarchetypeVersi on=1.1.2 \
-DgroupId=org.apache.flink.quickstart \
-DartifactId=flink-scala-project \
-Dversion=0.1 \
-Dpackage=org.apache.flink.quickstart \
-DinteractiveMode=false
The generated Flink quickstart project is located in a folder called flink-java-project (flink-scala-project for Scala projects).
Clone and build the flink-training-exercises project:
We need a few utility classes for the training exercises which are provided as a Maven dependency. Please clone the Git repository, enter the cloned folder, and build the project as follows
git clone http://github.com/fhueske/flink-training-exercises.git
cd flink-training-exercises
mvn clean install
Add the flink-training-exercises dependency to your Quickstart project
Open the pom.xml file in your Maven project (./flink-java-project/pom.xml or flink-scala-project/pom.xml) with a text editor and add the following dependencies.
<dependency>
<groupId>com.dataartisans</groupId>
<artifactId>flink-training-exercises</artifactId>
<version>0.5</version>
</dependency>
Build the Flink Quickstart project
In order to test the generated project and to download all required dependencies run the following command in the flink-java-project (flink-scala-project for Scala projects) folder.
mvn clean package
Maven will now start to download all required dependencies and build the Flink quickstart project.
3. Import the Flink Maven project into your IDE
The generated Maven project needs to be imported into your IDE:
- IntelliJ:
- Select “File” -> “Import Project”
- Select root folder of your project
- Select “Import project from external model”, select “Maven”
- Leave default options and finish the import
- Eclipse:
- “File” -> “Import” -> “Maven” -> “Existing Maven Project”
- Follow the import instructions
4. Execute and debug a Flink program in an IDE
Flink programs can be executed and debugged from within an IDE. This significantly eases the development process and gives a programming experience similar to working on a regular Java application. Starting a Flink program in your IDE is as easy as starting its main() method. Under the hood, the ExecutionEnvironment will start a local Flink instance within the execution process. Hence it is also possible to put breakpoints anywhere in your code and debug it.
Assuming you have an IDE with a Flink quickstart project imported, you can execute and debug the example WordCount program which is included in the quickstart project as follows:
- Open the org.apache.flink.quickstart.WordCount class in your IDE
- Place a breakpoint somewhere in the flatMap() method of the LineSplitter class which is defined in the WordCount class.
- Execute or debug the main() method of the WordCount class using your IDE.
5. Install Flink for local execution
In order to execute programs on a running Flink instance (rather than from within your IDE) you need to install Flink on your machine. To do so, follow these steps:
- Download the Apache Flink 1.1.2 release from http://flink.apache.org/downloads.html
- Extract the downloaded .tgz archive
The resulting folder contains a Flink setup that can be locally executed without any further configuration.
6. Download binaries for external systems
The training will also include exercises to show how Flink interacts with Kafka and Elasticsearch.
Please download the following binaries (make sure you pick the right version!):
Download Kafka 0.9.0.1: http://kafka.apache.org/downloads.html
Download Elasticsearch 2.3.5: https://www.elastic.co/downloads/past-releases/elasticsearch-2-3-5
Download Kibana 4.5.4: https://www.elastic.co/downloads/past-releases/kibana-4-5-4
7. Download training data
The training is based on a data set that can be downloaded at http://dataartisans.github.io/flink-training/trainingData/nycTaxiRides.gz
Please keep the file as it is and do not decompress it.