diff --git a/docs/config.toml b/docs/config.toml index a7fd9b56a6..24246c0d50 100644 --- a/docs/config.toml +++ b/docs/config.toml @@ -81,3 +81,8 @@ posts = "/:year/:month/:day/:title/" languageName = '中文版' contentDir = 'content.zh' weight = 2 + +[languages.tr] + languageName = 'Türkçe' + contentDir = 'content.tr' + weight = 2 \ No newline at end of file diff --git a/docs/content.tr/_index.md b/docs/content.tr/_index.md new file mode 100644 index 0000000000..d7bbf51de6 --- /dev/null +++ b/docs/content.tr/_index.md @@ -0,0 +1,25 @@ +--- +title: Apache Flink® — Veri Akışları Üzerinde Durumlu Hesaplamalar +type: docs +home: true +--- + + +{{< recent_posts >}} diff --git a/docs/content.tr/documentation/_index.md b/docs/content.tr/documentation/_index.md new file mode 100644 index 0000000000..7fd951daa8 --- /dev/null +++ b/docs/content.tr/documentation/_index.md @@ -0,0 +1,26 @@ +--- +title: Dokümantasyon +bookCollapseSection: true +weight: 15 +menu_weight: 1 +--- + + +# Dokümantasyon diff --git a/docs/content.tr/documentation/flink-cdc-master.md b/docs/content.tr/documentation/flink-cdc-master.md new file mode 100644 index 0000000000..97f6fcdc07 --- /dev/null +++ b/docs/content.tr/documentation/flink-cdc-master.md @@ -0,0 +1,27 @@ +--- +weight: 7 +title: CDC Master (snapshot) +bookHref: "https://nightlies.apache.org/flink/flink-cdc-docs-master" +--- + + +# Flink CDC documentation (latest snapshot) + +{{< external_link name="You can find the Flink CDC documentation for the latest snapshot here.">}} \ No newline at end of file diff --git a/docs/content.tr/documentation/flink-cdc-stable.md b/docs/content.tr/documentation/flink-cdc-stable.md new file mode 100644 index 0000000000..027231337a --- /dev/null +++ b/docs/content.tr/documentation/flink-cdc-stable.md @@ -0,0 +1,27 @@ +--- +weight: 6 +title: CDC $FlinkCDCStableShortVersion (stable) +bookHref: "https://nightlies.apache.org/flink/flink-cdc-docs-stable" +--- + + +# Flink CDC documentation (latest stable release) + +{{< external_link name="You can find the Flink CDC documentation for the latest stable release here.">}} \ No newline at end of file diff --git a/docs/content.tr/documentation/flink-kubernetes-operator-master.md b/docs/content.tr/documentation/flink-kubernetes-operator-master.md new file mode 100644 index 0000000000..0ad20a1b70 --- /dev/null +++ b/docs/content.tr/documentation/flink-kubernetes-operator-master.md @@ -0,0 +1,27 @@ +--- +weight: 5 +title: Kubernetes Operator Main (snapshot) +bookHref: "https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main" +--- + + +# Flink Kubernetes Operator documentation (latest snapshot) + +{{< external_link name="You can find the Flink Kubernetes Operator documentation for the latest snapshot here.">}} \ No newline at end of file diff --git a/docs/content.tr/documentation/flink-kubernetes-operator-stable.md b/docs/content.tr/documentation/flink-kubernetes-operator-stable.md new file mode 100644 index 0000000000..02ea0ec743 --- /dev/null +++ b/docs/content.tr/documentation/flink-kubernetes-operator-stable.md @@ -0,0 +1,27 @@ +--- +weight: 4 +title: Kubernetes Operator $FlinkKubernetesOperatorStableShortVersion (latest) +bookHref: "https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/" +--- + + +# Flink Kubernetes Operator documentation (latest stable release) + +{{< external_link name="You can find the Flink Kubernetes Operator documentation for the latest stable release here.">}} \ No newline at end of file diff --git a/docs/content.tr/documentation/flink-lts.md b/docs/content.tr/documentation/flink-lts.md new file mode 100644 index 0000000000..a453d652fc --- /dev/null +++ b/docs/content.tr/documentation/flink-lts.md @@ -0,0 +1,27 @@ +--- +weight: 2 +title: Flink $FlinkLTSShortVersion (LTS) +bookHref: "https://nightlies.apache.org/flink/flink-docs-lts/" +--- + + +# Flink documentation (latest LTS release) + +{{< external_link name="You can find the Flink documentation for the latest Long-Term Support(LTS) release here.">}} \ No newline at end of file diff --git a/docs/content.tr/documentation/flink-master.md b/docs/content.tr/documentation/flink-master.md new file mode 100644 index 0000000000..937bc92084 --- /dev/null +++ b/docs/content.tr/documentation/flink-master.md @@ -0,0 +1,27 @@ +--- +weight: 3 +title: Flink Master (snapshot) +bookHref: "https://nightlies.apache.org/flink/flink-docs-master/" +--- + + +# Flink documentation (latest snapshot) + +{{< external_link name="You can find the Flink documentation for the latest snapshot here.">}} \ No newline at end of file diff --git a/docs/content.tr/documentation/flink-stable.md b/docs/content.tr/documentation/flink-stable.md new file mode 100644 index 0000000000..ef210c2671 --- /dev/null +++ b/docs/content.tr/documentation/flink-stable.md @@ -0,0 +1,27 @@ +--- +weight: 1 +title: Flink $FlinkStableShortVersion (stable) +bookHref: "https://nightlies.apache.org/flink/flink-docs-stable/" +--- + + +# Flink documentation (latest stable release) + +{{< external_link name="You can find the Flink documentation for the latest stable release here.">}} \ No newline at end of file diff --git a/docs/content.tr/documentation/flink-stateful-functions-master.md b/docs/content.tr/documentation/flink-stateful-functions-master.md new file mode 100644 index 0000000000..4f41dae05c --- /dev/null +++ b/docs/content.tr/documentation/flink-stateful-functions-master.md @@ -0,0 +1,27 @@ +--- +weight: 11 +title: Stateful Functions Master (snapshot) +bookHref: "https://nightlies.apache.org/flink/flink-statefun-docs-master" +--- + + +# Flink Stateful Functions documentation (latest snapshot) + +{{< external_link name="You can find the Flink Stateful Functions documentation for the latest snapshot here.">}} \ No newline at end of file diff --git a/docs/content.tr/documentation/flink-stateful-functions-stable.md b/docs/content.tr/documentation/flink-stateful-functions-stable.md new file mode 100644 index 0000000000..69971a9b26 --- /dev/null +++ b/docs/content.tr/documentation/flink-stateful-functions-stable.md @@ -0,0 +1,27 @@ +--- +weight: 10 +title: Stateful Functions $StateFunStableShortVersion (stable) +bookHref: "https://nightlies.apache.org/flink/flink-statefun-docs-stable/" +--- + + +# Flink documentation (latest stable release) + +{{< external_link name="You can find the Flink documentation for the latest stable release here.">}} \ No newline at end of file diff --git a/docs/content.tr/documentation/flinkml-master.md b/docs/content.tr/documentation/flinkml-master.md new file mode 100644 index 0000000000..73c4fbc8fb --- /dev/null +++ b/docs/content.tr/documentation/flinkml-master.md @@ -0,0 +1,27 @@ +--- +weight: 9 +title: ML Master (snapshot) +bookHref: "https://nightlies.apache.org/flink/flink-ml-docs-master" +--- + + +# Flink ML documentation (latest snapshot) + +{{< external_link name="You can find the Flink ML documentation for the latest snapshot here.">}} \ No newline at end of file diff --git a/docs/content.tr/documentation/flinkml-stable.md b/docs/content.tr/documentation/flinkml-stable.md new file mode 100644 index 0000000000..225170a0ec --- /dev/null +++ b/docs/content.tr/documentation/flinkml-stable.md @@ -0,0 +1,27 @@ +--- +weight: 8 +title: ML $FlinkMLStableShortVersion (stable) +bookHref: "https://nightlies.apache.org/flink/flink-ml-docs-stable/" +--- + + +# Flink ML documentation (latest stable release) + +{{< external_link name="You can find the Flink ML documentation for the latest stable release here.">}} \ No newline at end of file diff --git a/docs/content.tr/downloads.md b/docs/content.tr/downloads.md new file mode 100644 index 0000000000..89ff79e509 --- /dev/null +++ b/docs/content.tr/downloads.md @@ -0,0 +1,172 @@ +--- +title: İndirmeler +bookCollapseSection: false +weight: 5 +menu_weight: 2 +--- + + +# Apache Flink® İndirmeleri + +## Apache Flink + +Apache Flink® {{< param FlinkStableVersion >}}, en son kararlı sürümdür. + +{{% flink_download "flink" %}} + +## Apache Flink konnektörleri + +Bunlar, ana Flink sürümlerinden ayrı olarak yayınlanan konnektörlerdir. + +{{% flink_download "flink_connectors" %}} + +## Apache Flink CDC + +Apache Flink® CDC {{< param FlinkCDCStableShortVersion >}}, en son kararlı sürümdür. + +{{% flink_download "flink_cdc" %}} + +## Apache Flink Stateful Functions + +Apache Flink® Stateful Functions {{< param StateFunStableShortVersion >}}, en son kararlı sürümdür. + +{{% flink_download "statefun" %}} + +## Apache Flink ML + +Apache Flink® ML {{< param FlinkMLStableShortVersion >}}, en son kararlı sürümdür. + +{{% flink_download "flink_ml" %}} + +## Apache Flink Kubernetes Operator + +Apache Flink® Kubernetes Operator {{< param FlinkKubernetesOperatorStableShortVersion >}}, en son kararlı sürümdür. + +{{% flink_download "flink_kubernetes_operator" %}} + +## Ek Bileşenler + +Bunlar, Flink projesinin geliştirdiği ve ana Flink sürümünün bir parçası olmayan bileşenlerdir: + +{{% flink_download "additional_components" %}} + +## Hash'leri ve İmzaları Doğrulama + +Sürümlerimizle birlikte, `*.sha512` dosyalarında sha512 hash'leri ve `*.asc` dosyalarında kriptografik imzalar da sağlıyoruz. Apache Software Foundation, herhangi bir sürüm imzalama [KEYS](https://downloads.apache.org/flink/KEYS) dosyasını kullanarak takip edebileceğiniz kapsamlı bir [hash ve imza doğrulama eğitimi](http://www.apache.org/info/verification.html) sunmaktadır. + +## Maven Bağımlılıkları + +### Apache Flink + +Projenize Apache Flink'i dahil etmek için `pom.xml` dosyanıza aşağıdaki bağımlılıkları ekleyebilirsiniz. Bu bağımlılıklar yerel bir yürütme ortamı içerir ve böylece yerel testleri destekler. + +- **Scala API**: Scala API'sini kullanmak için, `flink-java` artifact id'sini `flink-scala_2.12` ile ve `flink-streaming-java`'yı `flink-streaming-scala_2.12` ile değiştirin. + +```xml + + org.apache.flink + flink-java + {{< param FlinkStableVersion >}} + + + org.apache.flink + flink-streaming-java + {{< param FlinkStableVersion >}} + + + org.apache.flink + flink-clients + {{< param FlinkStableVersion >}} + +``` + +### Apache Flink Stateful Functions + +Projenize Apache Flink Stateful Functions'ı dahil etmek için `pom.xml` dosyanıza aşağıdaki bağımlılıkları ekleyebilirsiniz. + +```xml + + org.apache.flink + statefun-sdk + {{< param StateFunStableVersion >}} + + + org.apache.flink + statefun-flink-harness + {{< param StateFunStableVersion >}} + +``` + +`statefun-sdk` bağımlılığı, uygulamalar geliştirmeye başlamak için ihtiyacınız olan tek bağımlılıktır. +`statefun-flink-harness` bağımlılığı, uygulamanızı bir IDE'de yerel olarak test etmenizi sağlayan yerel bir yürütme ortamı içerir. + +### Apache Flink ML + +Projenize Apache Flink ML'yi dahil etmek için `pom.xml` dosyanıza aşağıdaki bağımlılıkları ekleyebilirsiniz. + +```xml + + org.apache.flink + flink-ml-core + {{< param FlinkMLStableVersion >}} + + + org.apache.flink + flink-ml-iteration + {{< param FlinkMLStableVersion >}} + + + org.apache.flink + flink-ml-lib + {{< param FlinkMLStableVersion >}} + +``` + +İleri düzey kullanıcılar, hedef kullanım senaryoları için yalnızca minimum Flink ML bağımlılık setini içe aktarabilirler: + +- Özel ML algoritmaları geliştirmek için `flink-ml-core` artifact'ini kullanın. +- İterasyon gerektiren özel ML algoritmaları geliştirmek için `flink-ml-core` ve `flink-ml-iteration` artifact'lerini kullanın. +- Flink ML'den hazır ML algoritmalarını kullanmak için `flink-ml-lib` artifact'ini kullanın. + +### Apache Flink Kubernetes Operator + +Projenize Apache Flink Kubernetes Operator'ı dahil etmek için `pom.xml` dosyanıza aşağıdaki bağımlılıkları ekleyebilirsiniz. + +```xml + + org.apache.flink + flink-kubernetes-operator + {{< param FlinkKubernetesOperatorStableVersion >}} + +``` + +## Eski sürümler için Güncelleme Politikası + +Mart 2017 itibariyle, Flink topluluğu mevcut ve önceki küçük sürümü hata düzeltmeleri ile destekleme [kararı aldı](https://lists.apache.org/thread/qf4hot3gb1dgvh4csxv2317263b6omm4). Eğer 1.2.x mevcut sürüm ise, 1.1.y desteklenen önceki küçük sürümdür. Her iki sürüm de kritik sorunlar için hata düzeltmeleri alacaktır. + +Mart 2023 itibariyle, Flink topluluğu yeni bir Flink küçük sürümünün yayınlanmasıyla birlikte, desteğini kaybeden Flink küçük sürümündeki çözülmüş kritik/engelleyici sorunlar için son bir hata düzeltme sürümü yapma [kararı aldı](https://lists.apache.org/thread/9w99mgx3nw5tc0v26wcvlyqxrcrkpzdz). Eğer 1.16.1 mevcut sürüm ve 1.15.4 en son önceki yama sürümü ise, 1.17.0 yayınlandığında çözülmüş kritik/engelleyici sorunları temizlemek için bir 1.15.5 oluşturacağız. + +Topluluğun her zaman daha eski sürümler için hata düzeltme sürümlerini tartışmaya açık olduğunu unutmayın. Bunun için lütfen geliştiricilerle dev@flink.apache.org e-posta listesinden iletişime geçin. + +## Tüm kararlı sürümler + +Tüm Flink sürümleri, sağlama toplamları ve kriptografik imzalar dahil olmak üzere [https://archive.apache.org/dist/flink/](https://archive.apache.org/dist/flink/) üzerinden erişilebilir. Yazı yazıldığı sırada, bu aşağıdaki sürümleri içermektedir: + +{{% flink_archive "release_archive" %}} \ No newline at end of file diff --git a/docs/content.tr/flink-packages.md b/docs/content.tr/flink-packages.md new file mode 100644 index 0000000000..a0de5c5d24 --- /dev/null +++ b/docs/content.tr/flink-packages.md @@ -0,0 +1,27 @@ +--- +title: flink-packages.org +bookCollapseSection: false +bookHref: "https://flink-packages.org/" +--- + + +# What is the Flink Kubernetes Operator? + +All information on the flink-packages can be found on the [flink-packages website.](https://flink-packages.org) \ No newline at end of file diff --git a/docs/content.tr/getting-started/_index.md b/docs/content.tr/getting-started/_index.md new file mode 100644 index 0000000000..f2b86d35b6 --- /dev/null +++ b/docs/content.tr/getting-started/_index.md @@ -0,0 +1,26 @@ +--- +title: Başlarken +bookCollapseSection: true +weight: 10 +menu_weight: 1 +--- + + +# Documentation diff --git a/docs/content.tr/getting-started/training-course.md b/docs/content.tr/getting-started/training-course.md new file mode 100644 index 0000000000..77922d50e4 --- /dev/null +++ b/docs/content.tr/getting-started/training-course.md @@ -0,0 +1,27 @@ +--- +weight: 6 +title: Eğitim Kursu +bookHref: "https://nightlies.apache.org/flink/flink-docs-stable/docs/learn-flink/overview/" +--- + + +# Eğitim Kursu + +{{< external_link name="Flink Eğitim Kursu hakkında tüm bilgileri buradan okuyabilirsiniz.">}} \ No newline at end of file diff --git a/docs/content.tr/getting-started/with-flink-cdc.md b/docs/content.tr/getting-started/with-flink-cdc.md new file mode 100644 index 0000000000..ede0be3a15 --- /dev/null +++ b/docs/content.tr/getting-started/with-flink-cdc.md @@ -0,0 +1,27 @@ +--- +weight: 3 +title: Flink CDC +bookHref: "https://nightlies.apache.org/flink/flink-cdc-docs-stable/docs/get-started/introduction/" +--- + + +# Getting Started with Flink CDC + +{{< external_link name="Read how you can get started with Flink CDC here.">}} diff --git a/docs/content.tr/getting-started/with-flink-kubernetes-operator.md b/docs/content.tr/getting-started/with-flink-kubernetes-operator.md new file mode 100644 index 0000000000..da8149151a --- /dev/null +++ b/docs/content.tr/getting-started/with-flink-kubernetes-operator.md @@ -0,0 +1,27 @@ +--- +weight: 2 +title: Flink Kubernetes Operator +bookHref: "https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/docs/try-flink-kubernetes-operator/quick-start/" +--- + + +# Getting Started with Flink Kubernetes Operator + +{{< external_link name="Read how you can get started with Flink Kubernetes Operator here.">}} \ No newline at end of file diff --git a/docs/content.tr/getting-started/with-flink-ml.md b/docs/content.tr/getting-started/with-flink-ml.md new file mode 100644 index 0000000000..9c83a2bf33 --- /dev/null +++ b/docs/content.tr/getting-started/with-flink-ml.md @@ -0,0 +1,27 @@ +--- +weight: 4 +title: Flink ML +bookHref: "https://nightlies.apache.org/flink/flink-ml-docs-stable/docs/try-flink-ml/quick-start/" +--- + + +# Getting Started with Flink ML + +{{< external_link name="Read how you can get started with Flink ML here.">}} \ No newline at end of file diff --git a/docs/content.tr/getting-started/with-flink-stateful-functions.md b/docs/content.tr/getting-started/with-flink-stateful-functions.md new file mode 100644 index 0000000000..e7400613b0 --- /dev/null +++ b/docs/content.tr/getting-started/with-flink-stateful-functions.md @@ -0,0 +1,27 @@ +--- +weight: 5 +title: Flink Stateful Functions +bookHref: "https://nightlies.apache.org/flink/flink-statefun-docs-stable/getting-started/project-setup.html" +--- + + +# Getting Started with Flink Stateful Functions + +{{< external_link name="Read how you can get started with Flink Stateful Functions here.">}} \ No newline at end of file diff --git a/docs/content.tr/getting-started/with-flink.md b/docs/content.tr/getting-started/with-flink.md new file mode 100644 index 0000000000..4e78c8ea20 --- /dev/null +++ b/docs/content.tr/getting-started/with-flink.md @@ -0,0 +1,27 @@ +--- +weight: 1 +title: Flink +bookHref: "https://nightlies.apache.org/flink/flink-docs-stable/docs/try-flink/local_installation/" +--- + + +# Getting Started with Flink + +{{< external_link name="Read how you can get started with Flink here.">}} \ No newline at end of file diff --git a/docs/content.tr/how-to-contribute/_index.md b/docs/content.tr/how-to-contribute/_index.md new file mode 100644 index 0000000000..87fcdfce75 --- /dev/null +++ b/docs/content.tr/how-to-contribute/_index.md @@ -0,0 +1,26 @@ +--- +title: Nasıl Katkıda Bulunulur +bookCollapseSection: true +weight: 20 +menu_weight: 1 +--- + + +# Nasıl katkıda bulunulur \ No newline at end of file diff --git a/docs/content.tr/how-to-contribute/code-style-and-quality-common.md b/docs/content.tr/how-to-contribute/code-style-and-quality-common.md new file mode 100644 index 0000000000..e8dc67bf67 --- /dev/null +++ b/docs/content.tr/how-to-contribute/code-style-and-quality-common.md @@ -0,0 +1,363 @@ +--- +title: Kod Stili ve Kalite Kılavuzu — Genel Kurallar +bookCollapseSection: false +bookHidden: true +--- + +# Kod Stili ve Kalite Kılavuzu — Genel Kurallar + +#### [Önsöz]({{< relref "how-to-contribute/code-style-and-quality-preamble" >}}) +#### [Pull Request'ler ve Değişiklikler]({{< relref "how-to-contribute/code-style-and-quality-pull-requests" >}}) +#### [Genel Kodlama Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-common" >}}) +#### [Java Dili Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-java" >}}) +#### [Scala Dili Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-scala" >}}) +#### [Bileşenler Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-components" >}}) +#### [Biçimlendirme Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-formatting" >}}) + +
+ +## 1. Telif Hakkı + +Her dosya, başlık olarak Apache lisans bilgisini içermelidir. + +``` +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +``` + +## 2. Araçlar + +IDE araçlarını yapılandırmak için {{< docs_link file="flink-docs-stable/docs/flinkdev/ide_setup/" name="IDE Kurulum Kılavuzu">}}'nu takip etmenizi öneririz. + + + + +### Uyarılar + +* Sıfır uyarıya ulaşmaya çalışırız +* Mevcut kodda birçok uyarı olsa da, yeni değişiklikler ek derleyici uyarıları eklememeli +* Uyarıyı sağlıklı bir şekilde ele almak mümkün değilse (jeneriklerle çalışırken bazı durumlarda), uyarıyı bastırmak için bir ek açıklama ekleyin +* Yöntemleri kullanımdan kaldırırken, bunun ek uyarılar getirmediğinden emin olun + + + +## 3. Yorumlar ve Kod Okunabilirliği + + +### Yorumlar + +**Altın kural: Kodun anlaşılmasını desteklemek için gerektiği kadar yorum ekleyin, ancak gereksiz bilgi eklemeyin.** + +Şunları düşünün: + +* Kod ne yapıyor? +* Kod bunu nasıl yapıyor? +* Kod neden böyle? + +Kodun kendisi, mümkün olduğunca "ne" ve "nasıl" sorularını açıklamalıdır. + +* Sınıfların rollerini ve yöntemlerin sözleşmelerini, yöntem adından açıkça anlaşılamadığı durumlarda tanımlamak için JavaDoc'ları kullanın ("ne" sorusu). +* Kodun akışı, "nasıl" sorusuna iyi bir açıklama sağlamalıdır. + Değişken ve yöntem adlarını, kodun kendini belgelendirmesinin bir parçası olarak düşünün. +* Bir birim oluşturan daha büyük blokları, o bloğun ne yaptığını tanımlayan açıklayıcı bir isimle özel bir yönteme taşımak, kodun okunmasını genellikle kolaylaştırır. + +Kod içi yorumlar, "neden" sorusunu açıklamaya yardımcı olur. + +* Örneğin `// bu belirli kod düzeni JIT'in şunu veya bunu daha iyi yapmasına yardımcı olur` +* Veya `// bu alanı burada null yapmak, gelecekteki yazma girişimlerinin daha hızlı başarısız olması anlamına gelir` +* Veya `// bu yöntemin gerçekte çağrıldığı argümanlarla, görünüşte naif olan bu yaklaşım aslında optimize edilmiş/akıllı sürümlerden daha iyi çalışır` + +Kod içi yorumlar, kodun kendisinde zaten açık olan "ne" ve "nasıl" hakkında gereksiz bilgiler belirtmemelidir. + +JavaDoc'lar anlamsız bilgiler belirtmemelidir (sadece Checkstyle denetleyicisini memnun etmek için). + +__Yapmayın:__ + +``` +/** + * The symbol expression. + */ +public class CommonSymbolExpression {} +``` +__Yapın:__ + +``` +/** + * An expression that wraps a single specific symbol. + * A symbol could be a unit, an alias, a variable, etc. + */ +public class CommonSymbolExpression {} +``` + + +### Dallar ve İç İçe Geçme + +If koşulunu çevirerek ve erken çıkarak, kapsamların derin iç içe geçmesinden kaçının. + +__Yapmayın:__ + +``` +if (a) { + if (b) { + if (c) { + the main path + } + } +} +``` + +__Yapın:__ + +``` +if (!a) { + return .. +} + +if (!b) { + return ... +} + +if (!c) { + return ... +} + +the main path +``` + + +## 4. Tasarım ve Yapı + +Bunun ne olduğunu tam olarak belirtmek zor olduğu gibi, iyi tasarım için bazı özellikler vardır. Bu özellikler varsa, iyi bir yönde olduğu ihtimali yüksektir. Bu özellikler elde edilemiyorsa, tasarımın hatalı olduğu ihtimali yüksektir. + + +### Immutability ve Eager Initialization + +1. Mümkün olduğunca immutable türler kullanmayı deneyin, özellikle API'ler, mesajlar, tanımlayıcılar, özellikler, yapılandırma, vb. +2. İyi bir genel yaklaşım, bir sınıfın mümkün olduğunca çok alanını `final` yapmayı denemektir. +3. Haritaların anahtarları olarak kullanılan sınıfların tamamen sabit ve sadece `final` alanları olmalıdır (muhtemelen yardımcı alanlar hariç, örneğin ilk önce yüklenen hash kodları). +4. Sınıfları ilk önce tamamlanana kadar kullanılabilir hale getirin. Kurucu tamamlanana kadar nesne kullanılamaz. + + +### Değiştirilebilir Alanların Null Olabilirliği + +Nullability için Flink kod tabanı, aşağıdaki konvansiyonları takip etmeyi hedefler: + +* Alanlar, parametreler ve dönüş türleri her zaman null olmayan, eğer bunun aksi belirtilmediği sürece +* Tüm alanlar, parametreler ve yöntem türleri, null olabilecekleri için `@javax.annotation.Nullable` ile işaretlenmelidir. + Böylece, IntelliJ'den tüm bölümlerde null değerleri hakkında düşünmeniz gerektiğini bildirir. +* Tüm değişken (final olmayan) alanlar için varsayım, alan değeri değişirken her zaman bir değer olduğudur. + * Bu, bu nesnenin ömrü boyunca bu gerçekten null olmayabileceğini değil, değişken değeri değişirken her zaman bir değer olduğunu görmek için iki kere kontrol etmek gerektiğini gösterir. + +_Not: Bu, `@Nonnull` ek açıklamaları genellikle gerekli olmadığını, ancak önceki ek açıklama ile geçersiz kılmak için veya null olmayı belirtmek için bir bağlamda kullanılabilir._ + +`Optional` bir yöntem için dönüş türü için iyi bir çözümdür, böylece null dönüş türleri `Optional` ile değiştirilebilir. Bkz. [Java Optional'un kullanımı]({{< relref "how-to-contribute/code-style-and-quality-java" >}}#java-optional). + + +### Kod Duplicasyonunu Önleme + +1. Her zaman kodu/kopyalamak veya benzer bir tür işlevi farklı bir yerde yeniden oluşturmak için, değişiklikleri yeniden kullanmak/soyutlamak/soyutlamak için yollarını düşünün. +2. Farklı özellemeler arasındaki ortak davranışlar, başka bir bileşene (veya paylaşılan bir sınıfa) paylaşılmalıdır. +3. Her zaman "özel statik final" sabitleri kullanın, aksi takdirde farklı yerlerde dizeler veya diğer özel değerleri tekrarlamak. Sabitler, bir sınıfın üst üye alanında bildirilmelidir. + + +### Test Edilebilirlik İçin Tasarım + +Test edilebilir kod genellikle iyi bir bütünleşik işlevi ve dışarıdan yeniden kullanılabilir olarak yapılır. + +Bir özet veya sorunlar / belirtiler ve önerilen yeniden düzenleme, PDF'deki bağlantıda bulunabilir. Lütfen not, PDF'deki örnekler genellikle bir bağımlılık enjeksiyon çerçevesi (Guice) kullanır, ancak bunun aynı şekilde çalışmadığını unutmayın.[^1] + +[http://misko.hevery.com/attachments/Guide-Writing%20Testable%20Code.pdf](http://misko.hevery.com/attachments/Guide-Writing%20Testable%20Code.pdf) + +Burada en önemli noktaların kısa bir özeti bulunmaktadır. + + +**Bağımlılıkları Enjekte Et** + +Reusability, bağımlılıkları oluşturan kurucuların (alanları atayan nesneler) oluşturmamasına yardımcı olur, ancak bunları parametreler olarak kabul ederler. + +* Etkili bir şekilde, kurucuların `new` anahtar kelimesi olmaması gerekir. +* Özel durumlar, yeni boş bir koleksiyon (`new ArrayList<>()`) veya benzer yardımcı alanları oluşturmaktır (yalnızca temel bağımlılıkları olan nesneler). + +Tam nesne oluşturmak için kolay / okunabilirlik için, tüm nesneyi bağımlılıklarıyla oluşturmak için fabrika yöntemleri veya ek olarak kullanılabilir ekstra kurucular ekleyin. + +Hiçbir zaman, bir testte nesne alanlarını değiştirmek için yansıma veya "Beyaz Kutu" util'ini kullanmanız gerekmez veya PowerMock'i kullanmanız gerekmez. + + +**"Çok İşlevli" Önleme** + +Eğer test sırasında büyük bir diğer bileşen setine ihtiyacınız varsa (çok işlevli), yeniden düzenlemeyi düşünün. + +Test etmek istediğiniz bileşen/sınıf, muhtemelen başka bir geniş bileşene (ve bunun uygulamasına) bağlıdır, yerine minimal arayüz (soyutlama) gerekli için. + +Bu durumda, arayüzleri (minimal gerekli arayüzü çıkarın) ve bu durumda test stub sağlayın. + +* Örneğin, S3RecoverableMultiPartUploader test etmek için gerçek S3 erişimi gerekiyorsa + o zaman S3 erişimi, bir arayüzün ve testin bunu test stub ile değiştirmesi gerekir +* Bu, doğal olarak bağımlılıkları enjekte etmeyi gerektirir (bkz. yukarıda) + +⇒ Lütfen not, bu adımlar genellikle test uygulamak için daha fazla çaba gerektirir, ancak diğer bileşenlerde değişiklikler yapılırken testlerin daha dayanıklı olmasını sağlar, yani diğer bileşenlerde değişiklikler yapılırken testleri değiştirmeniz gerekmez. + +### Performans Duyarlılığı + +Biz, kodun "koordinasyon" ve kodun "veri işleme" olduğunu kavramakla ilgilenebiliriz. Koordinasyon kodu her zaman basitlik ve temizlik için favori olmalıdır. Veri işleme kodu, performans için yüksek performans gerektiren ve performans için optimize edilmelidir. + +Bu, genel fikirlerin bölümlerdeki üstünü uygulamak anlamına gelir, ancak belki bazı noktalarda bazı yönleri atlamak için daha fazla performans için mümkündür. + + +**Hangi kod yolları veri işleme yollarıdır?** + +* Kayıt bazı yolları: Kayıt bazı yöntemler ve kod yolları, her kayıt için çağrılır. Örneğin, Bağlayıcılar, Serileştiriciler, Durum Sonları, Formatlar, Görevler, Operatörler, Ölçümler, çalışma zamanı veri yapıları, vb. +* I/O yöntemleri: Mesajları veya veri parçalarını tamponlar arasında taşımak. Örnekler, RPC sistemi, Ağ Yığını, Dosya Sistemleri, Kodlayıcılar / Kod Çözücüler, vb. + + +**Performans kritik kodun ne yaptığını öğrendiği şeyler** + +* Mutable nesneleri (ve bazen GC'ye basınç atmak için) kullanmak, bazen immutability için ödün vermek anlamına gelir. +* Temel türleri, temel türlerin dizilerini veya MemorySegment/ByteBuffer kullanıp temel türlerine ve byte dizilerine anlamını kodlamak, bunları ayrı sınıflar ve nesneler kullanıp kullanmamak anlamına gelir. +* Kodu, çok kayıt için çalışırken pahalı işleri (ayırma, arama, sanal yöntem çağrıları, vb.) çok kayıt için çalışırken çalışırken amortize etmek için yapılandırın. +* Okunabilirlik için optimize edilmiş kod düzeni, JIT için değil, okunabilirlik için değil, JIT derleyicisinin içine almak anlamına gelir. Örnekler, başka bir sınıfın alanlarını içine alıp (JIT'in bunu çalışma zamanında işlemesi sorgulanırken) veya kodu JIT derleyicisinin içine almak için yapılandırın, veya döngüleri içine alıp, vektörizasyonu, vb. + + + +## 5. Eşzamanlılık ve İş Parçacığı + +**Çoğunluk kod yolları herhangi bir eşzamanlılık gerektirmez.** Doğru iç içe işlevler, neredeyse her zaman ihtiyacınızı ortadan kaldırır. + +* Flink çekirdeği ve çalışma zamanı, bu inşa bloklarını sağlamak için eşzamanlılık kullanır. + Örnekler, RPC sistemi, Ağ Yığını, Görevlerin posta kutusu modeli veya bazı önceden tanımlı Kaynak / Kuyruk yardımcılarıdır. +* Flink çekirdeği ve çalışma zamanı, bu yapı bloklarını sağlamak için eşzamanlılık kullanır. + Örnekler, RPC sistemi, Ağ Yığını, Görevlerin posta kutusu modeli veya bazı önceden tanımlı Kaynak / Kuyruk yardımcılarıdır. +* Tamamen bu noktada değiliz, ancak kendi eşzamanlılığını uygulayan herhangi bir yeni eklenti, temel sistem yapı blokları kategorisine girmedikçe inceleme altında olmalıdır. +* Katkıda bulunanlar, eşzamanlı kod uygulamaları gerektiğini düşünüyorlarsa, mevcut bir soyutlama/yapı bloğu olup olmadığını veya bir tane eklenip eklenmemesi gerektiğini görmek için committer'lar ile iletişime geçmelidir. + + +**Bir bileşen geliştirirken iş parçacığı modeli ve senkronizasyon noktaları hakkında önceden düşünün.** + +* Örneğin: tek iş parçacıklı, engelleyici, engelleyici olmayan, senkron, asenkron, çok iş parçacıklı, iş parçacığı havuzu, mesaj kuyrukları, volatile, senkronize blok/yöntemler, muteksler, atomikler, geri çağrılar, … +* Bu şeyleri doğru almak ve bunlar hakkında önceden düşünmek, sınıf arayüzlerini/sorumluluklarını tasarlamaktan daha da önemlidir, çünkü sonradan değiştirmek çok daha zordur. + + +**Mümkünse her şekilde iş parçacıklarını kullanmaktan kaçınmaya çalışın.** + +* Eğer bir iş parçacığı başlatmak için bir durumunuz olduğunu düşünüyorsanız, bunu açıkça incelenmesi gereken bir şey olarak pull request'te belirtin. + + +**İş parçacıklarını kullanmanın başlangıçta göründüğünden çok daha zor olduğunu unutmayın** + +* İş parçacıklarının temiz bir şekilde kapatılması çok zordur. +* Kesintileri sağlam bir şekilde ele almak (hem yavaş kapatmadan hem de canlı kilitlerden kaçınmak) neredeyse bir Java Sihirbazı gerektirir. +* İş parçacıklarından temiz hata yayılımını tüm durumlarda sağlamak, titiz bir tasarım gerektirir. +* Çok iş parçacıklı uygulama/bileşen/sınıfın karmaşıklığı, her ek senkronizasyon noktası/blok/kritik bölüm ile üstel olarak artar. Kodunuz başlangıçta anlaşılması kolay olabilir, ancak hızlı bir şekilde bu noktanın ötesine geçebilir. +* Çok iş parçacıklı kodun düzgün şekilde test edilmesi temel olarak imkansızdır, alternatif yaklaşımlar (asenkron kod, engelleyici olmayan kod, mesaj kuyrukları ile aktör modeli gibi) test edilmesi oldukça kolaydır. +* Genellikle çok iş parçacıklı kod, modern donanımda alternatif yaklaşımlara kıyasla daha az verimlidir. + + +**java.util.concurrent.CompletableFuture'dan haberdar olun** + +* Diğer eşzamanlı kodlarda olduğu gibi, bir CompletableFuture kullanmaya nadiren ihtiyaç olmalıdır. +* Bir future'ı tamamlamak, açıkça bir tamamlama yürütücüsü belirtilmedikçe, sonucun tamamlanmasını bekleyen zincirlenmiş herhangi bir future'ı çağrı iş parçacığında da tamamlayacaktır. +* Bu, örneğin Scheduler / ExecutionGraph'ın bölümlerinde olduğu gibi, tüm yürütme senkron / tek iş parçacıklı olması gerekiyorsa kasıtlı olabilir. + * Flink, tek iş parçacıklı bir RPC uç noktası çalıştığı gibi aynı iş parçacığında zincirlenmiş işleyicileri çağırmaya izin vermek için bir "ana iş parçacığı yürütücüsü" kullanır. +* Bu, future'ı tamamlayan iş parçacığı hassas bir iş parçacığı ise beklenmedik olabilir. + * Bu durumda, bir yürütücü mevcut olduğunda `future.complete(value)` yerine `CompletableFuture.supplyAsync(value, executor)` kullanmak daha iyi olabilir. +* Bir future'ın tamamlanmasını beklerken engellendiğinizde, her zaman sonuç için bir zaman aşımı sağlayın ve zaman aşımlarını açıkça ele alın. +* Şunları beklemek istiyorsanız `CompletableFuture.allOf()`/`anyOf()`, `ExecutorCompletionService` veya `org.apache.flink.runtime.concurrent.FutureUtils#waitForAll` kullanın: tüm sonuçlar/herhangi bir sonuç/tüm sonuçlar ancak (yaklaşık) tamamlanma sırasına göre ele alınan. + + + + +## 6. Bağımlılıklar ve Modüller + +* **Bağımlılık ayak izini küçük tutun** + * Ne kadar çok bağımlılık olursa, topluluğun bunları bir bütün olarak yönetmesi o kadar zorlaşır. + * Bağımlılık yönetimi, bağımlılık çakışmaları, lisansları ve ilgili bildirimleri sürdürme ve güvenlik açıklarını ele almayı içerir. + * Bağımlılığın gelecekteki çakışmaları önlemek için gölgelenip/yeniden konumlandırıp konumlandırılmaması gerektiğini tartışın. +* **Sadece bir yöntem için bağımlılık eklemeyin** + * Mümkünse Java'nın yerleşik araçlarını kullanın. + * Eğer yöntem Apache lisanslı ise, uygun atıf ile yöntemi bir Flink yardımcı sınıfına kopyalayabilirsiniz. +* **Bağımlılıkların beyanı** + * Açıkça dayandığınız bağımlılıkları beyan edin, ister doğrudan içe aktarıp kullandığınız sınıfları sağlasın, isterse Log4J gibi doğrudan kullandığınız bir hizmet sağlasın. + * Geçişli bağımlılıklar yalnızca çalışma zamanında ihtiyaç duyulan ancak kendinizin kullanmadığınız bağımlılıkları sağlamalıdır. + * [[kaynak](https://stackoverflow.com/questions/15177661/maven-transitive-dependencies)] +* **Maven modüllerindeki sınıfların konumu** + * Yeni bir sınıf oluşturduğunuzda, nereye koyacağınızı düşünün. + * Bir sınıf gelecekte birden fazla modül tarafından kullanılabilir ve bu durumda bir `common` modülüne ait olabilir. + + + +## 7. Test Etme + +### Araçlar + +Kod tabanımızı test çerçevesi ve iddia kütüphanesi olarak [JUnit 5](https://junit.org/junit5/docs/current/user-guide/) ve [AssertJ](https://assertj.github.io/doc/)'ye taşıyoruz. + +Belirli bir neden olmadığı sürece, Flink'e yeni testlerle katkıda bulunurken ve hatta mevcut testleri değiştirirken JUnit 5 ve AssertJ kullandığınızdan emin olun. Hamcrest, JUnit iddialarını ve `assert` yönergesini kullanmayın. +Testlerinizi okunabilir hale getirin ve AssertJ tarafından veya bazı flink modülleri tarafından sağlanan [özel iddialar](https://assertj.github.io/doc/#assertj-core-custom-assertions) tarafından sağlanan iddia mantığını çoğaltmayın. +Örneğin, şundan kaçının: + +```java +assert list.size() == 10; +for (String item : list) { + assertTrue(item.length() < 10); +} +``` + +Ve bunun yerine şunu kullanın: + +```java +assertThat(list) + .hasSize(10) + .allMatch(item -> item.length() < 10); +``` + +### Hedefli testler yazın + +* Uygulamaları değil sözleşmeleri test edin: Bir dizi eylemden sonra, bileşenlerin belirli bir durumda olduğunu test edin, bileşenlerin bir dizi dahili durum değişikliği izlediğini test etmek yerine. + * Örneğin, tipik bir anti-pattern, testin bir parçası olarak belirli bir yöntemin çağrılıp çağrılmadığını kontrol etmektir. +* Bunu uygulamanın bir yolu, bir birim testi yazarken _Düzenle_, _Harekete Geç_, _İddia Et_ test yapısını takip etmeye çalışmaktır ([https://xp123.com/articles/3a-arrange-act-assert/](https://xp123.com/articles/3a-arrange-act-assert/)) + + Bu, testin mekaniğinden ziyade testin amacını (test altındaki senaryo nedir) iletmeye yardımcı olur. Teknik detaylar, test sınıfının altındaki statik yöntemlere gider. + + Bu modeli takip eden Flink'teki testlerin örnekleri şunlardır: + + * [https://github.com/apache/flink/blob/master/flink-core/src/test/java/org/apache/flink/util/LinkedOptionalMapTest.java](https://github.com/apache/flink/blob/master/flink-core/src/test/java/org/apache/flink/util/LinkedOptionalMapTest.java) + * [https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-base/src/test/java/org/apache/flink/fs/s3/common/writer/RecoverableMultiPartUploadImplTest.java](https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-base/src/test/java/org/apache/flink/fs/s3/common/writer/RecoverableMultiPartUploadImplTest.java) + + +### Mockito'dan Kaçının - Yeniden Kullanılabilir Test Uygulamaları Kullanın + +* Mockito tabanlı testler, işlevselliğin çoğaltılmasını ve etki yerine uygulamayı test etmeyi teşvik ederek uzun vadede bakımı maliyetli olma eğilimindedir. + * Daha fazla ayrıntı: [https://docs.google.com/presentation/d/1fZlTjOJscwmzYadPGl23aui6zopl94Mn5smG-rB0qT8](https://docs.google.com/presentation/d/1fZlTjOJscwmzYadPGl23aui6zopl94Mn5smG-rB0qT8) +* Bunun yerine, yeniden kullanılabilir test uygulamaları ve yardımcı programlar oluşturun. + * Bu şekilde, bazı sınıflar değiştiğinde, sadece birkaç test yardımcı programı veya sahte nesneyi güncellemek zorunda kalırız. + +### JUnit testlerinde zaman aşımlarından kaçının + +Genel olarak, JUnit testlerinde yerel zaman aşımları ayarlamaktan kaçınmalı, bunun yerine Azure'daki global zaman aşımına güvenmeliyiz. Global zaman aşımı, oluşturma zaman aşımına uğramadan hemen önce iş parçacığı dökümlerini alarak hata ayıklamayı kolaylaştırır. + +Aynı zamanda, manuel olarak ayarladığınız herhangi bir zaman aşımı değeri keyfidir. Çok düşük ayarlanırsa, test kararsızlıkları elde edersiniz. Çok düşük ne demek, donanım ve mevcut kullanım (özellikle G/Ç) gibi çok sayıda faktöre bağlıdır. Dahası, yerel bir zaman aşımı daha fazla bakım gerektiriyor. Bir oluşturmayı ayarlayabileceğiniz bir düğme daha. Testi biraz değiştirirseniz, zaman aşımını da iki kez kontrol etmeniz gerekir. Bu nedenle, sadece zaman aşımlarını artıran oldukça fazla commit olmuştur. + + +[^1]: We are keeping such frameworks out of Flink, to make debugging easier and avoid dependency clashes. diff --git a/docs/content.tr/how-to-contribute/code-style-and-quality-components.md b/docs/content.tr/how-to-contribute/code-style-and-quality-components.md new file mode 100644 index 0000000000..1695fee1d4 --- /dev/null +++ b/docs/content.tr/how-to-contribute/code-style-and-quality-components.md @@ -0,0 +1,149 @@ +--- +title: Kod Stili ve Kalite Kılavuzu — Bileşenler Kılavuzu +bookCollapseSection: false +bookHidden: true +--- + +# Kod Stili ve Kalite Kılavuzu — Bileşenler Kılavuzu + +#### [Önsöz]({{< relref "how-to-contribute/code-style-and-quality-preamble" >}}) +#### [Pull Request'ler ve Değişiklikler]({{< relref "how-to-contribute/code-style-and-quality-pull-requests" >}}) +#### [Genel Kodlama Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-common" >}}) +#### [Java Dili Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-java" >}}) +#### [Scala Dili Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-scala" >}}) +#### [Bileşenler Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-components" >}}) +#### [Biçimlendirme Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-formatting" >}}) + +## Bileşene Özel Kılavuzlar + +_Belirli bileşenlerdeki değişiklikler hakkında ek kılavuzlar._ + + +### Konfigürasyon Değişiklikleri + +Konfigürasyon seçeneği nerede olmalıdır? + +* 'flink-conf.yaml': İşler arasında standartlaştırmak isteyebileceğiniz yürütme davranışıyla ilgili tüm konfigürasyon. Bunu, birinin "ops" şapkasıyla veya diğer ekiplere bir stream processing platformu sağlayan birinin ayarlayacağı parametreler olarak düşünün. + +* 'ExecutionConfig': Yürütme sırasında operatörler tarafından ihtiyaç duyulan, belirli bir Flink uygulamasına özgü parametreler. Tipik örnekler watermark aralığı, serializer parametreleri, nesne yeniden kullanımıdır. +* ExecutionEnvironment (kodda): Belirli bir Flink uygulamasına özgü olan ve yalnızca program / veri akışı oluşturmak için gereken, yürütme sırasında operatörler içinde gerekmeyen her şey. + +Konfigürasyon anahtarlarının adlandırılması: + +* Konfigürasyon anahtarı adları hiyerarşik olmalıdır. + Konfigürasyonu iç içe nesneler (JSON tarzı) olarak düşünün + + ``` + taskmanager: { + jvm-exit-on-oom: true, + network: { + detailed-metrics: false, + request-backoff: { + initial: 100, + max: 10000 + }, + memory: { + fraction: 0.1, + min: 64MB, + max: 1GB, + buffers-per-channel: 2, + floating-buffers-per-gate: 16 + } + } + } + ``` + +* Sonuç olarak konfigürasyon anahtarları şöyle olmalıdır: + + **DEĞİL** `"taskmanager.detailed.network.metrics"` + + **Bunun yerine** `"taskmanager.network.detailed-metrics"` + + +### Connector'lar + +Connector'lar tarihsel olarak uygulanması zordur ve thread'ler, concurrency ve checkpointing'in birçok yönüyle başa çıkmaları gerekir. + +[FLIP-27](https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface)'nin bir parçası olarak, kaynaklar için bunu çok daha basit hale getirmek için çalışıyoruz. Yeni kaynakların artık concurrency/threading ve checkpointing'in herhangi bir yönüyle başa çıkması gerekmemelidir. + +Yakın gelecekte sink'ler için benzer bir FLIP beklenebilir. + + +### Örnekler + +Örnekler kendine yeterli olmalı ve çalıştırmak için Flink dışında sistemler gerektirmemelidir. Kafka connector gibi belirli connector'ların nasıl kullanılacağını gösteren örnekler hariç. Kullanılması uygun olan kaynak/sink'ler, üretimde kullanılmaması gereken ancak işlerin nasıl çalıştığını keşfetmek için oldukça kullanışlı olan `StreamExecutionEnvironment.socketTextStream` ve dosya tabanlı kaynak/sink'lerdir. (Streaming için sürekli dosya kaynağı vardır) + +Örnekler ayrıca saf oyuncak örnekler olmamalı, gerçek dünya kodu ile tamamen soyut örnekler arasında bir denge kurmalıdır. WordCount örneği artık oldukça eskimiş olsa da, işlevselliği vurgulayan ve yararlı şeyler yapabilen basit kodun iyi bir örneğidir. + +Örnekler ayrıca yorumlarda yoğun olmalıdır. Sınıf düzeyindeki Javadoc'ta örneğin genel fikrini açıklamalı ve kod boyunca neler olduğunu ve hangi işlevselliğin kullanıldığını açıklamalıdır. Beklenen giriş verileri ve çıkış verileri de açıklanmalıdır. + +Örnekler, `bin/flink run path/to/myExample.jar --param1 … --param2` kullanarak bir örnek çalıştırabilmeniz için parametre ayrıştırmayı içermelidir. + + +### Table & SQL API + + +#### Semantik + +**SQL standardı ana doğruluk kaynağı olmalıdır.** + +* Sözdizimi, semantik ve özellikler SQL ile uyumlu olmalıdır! +* Tekerleği yeniden icat etmemize gerek yok. Çoğu sorun endüstri genelinde zaten tartışılmış ve SQL standardında yazılmıştır. +* En yeni standarda güveniyoruz (bu belgeyi yazarken SQL:2016 veya ISO/IEC 9075:2016 ([indirme](https://standards.iso.org/ittf/PubliclyAvailableStandards/c065143_ISO_IEC_TR_19075-5_2016.zip)). Her bölüm çevrimiçi olarak mevcut değildir, ancak hızlı bir web araması burada yardımcı olabilir. + +Standarttan sapmaları veya satıcıya özgü yorumları tartışın. + +* Bir sözdizimi veya davranış bir kez tanımlandığında kolayca geri alınamaz. +* Standardı genişletmek veya yorumlamak gereken katkılar, toplulukla kapsamlı bir tartışma gerektirir. +* Lütfen, Postgres, Microsoft SQL Server, Oracle, Hive, Calcite, Beam gibi diğer satıcıların bu tür durumları nasıl ele aldığı hakkında bazı ilk araştırmaları yaparak committer'lara yardımcı olun. + + +Table API'yi SQL ve Java/Scala programlama dünyası arasında bir köprü olarak düşünün. + +* Table API, ilişkisel modeli takip eden analitik programlar için Gömülü Alana Özgü bir Dildir. + Sözdizimi ve adlar açısından SQL standardını katı bir şekilde takip etmesi gerekmez, ancak daha sezgisel hissetmeye yardımcı oluyorsa, bir programlama dilinin fonksiyonları ve özellikleri adlandıracağı/yapacağı şekilde daha yakın olabilir. +* Table API'nin bazı SQL olmayan özellikleri olabilir (örn. map(), flatMap() vb.) ancak yine de "SQL gibi hissetmelidir". Mümkünse fonksiyonlar ve işlemler eşit semantik ve isimlendirmeye sahip olmalıdır. + + +#### Yaygın hatalar + +* Bir özellik eklerken SQL'in tip sistemini destekleyin. + * Bir SQL fonksiyonu, connector'ı veya formatı, en başından itibaren çoğu SQL tipini doğal olarak desteklemelidir. + * Desteklenmeyen tipler kafa karışıklığına yol açar, kullanılabilirliği sınırlar ve aynı kod yollarına birden çok kez dokunarak ek yük oluşturur. + * Örneğin, bir `SHIFT_LEFT` fonksiyonu eklerken, katkının sadece `INT` için değil, aynı zamanda `BIGINT` veya `TINYINT` için de yeterince genel olduğundan emin olun. + + +#### Test etme + +Null olabilirliği test edin. + +* SQL doğal olarak neredeyse her işlem için `NULL`'u destekler ve 3 değerli bir boolean mantığına sahiptir. +* Her özelliği null olabilirlik açısından da test ettiğinizden emin olun. + + +Tam entegrasyon testlerinden kaçının + +* Bir Flink mini-cluster'ı başlatmak ve bir SQL sorgusu için üretilen kodun derlenmesini gerçekleştirmek pahalıdır. +* Planlayıcı testleri veya API çağrılarının varyasyonları için entegrasyon testlerinden kaçının. +* Bunun yerine, bir planlayıcıdan çıkan optimize edilmiş planı doğrulayan birim testleri kullanın. Veya doğrudan bir runtime operatörünün davranışını test edin. + + +#### Uyumluluk + +Yama sürümlerinde fiziksel plan değişiklikleri getirmeyin! + +* Streaming SQL'de durum için geriye dönük uyumluluk, fiziksel yürütme planının sabit kalması gerçeğine dayanır. Aksi takdirde, oluşturulan Operatör Adları/ID'leri değişir ve durum eşleştirilemez ve geri yüklenemez. +* Dolayısıyla, optimize edilmiş bir streaming pipeline'ının fiziksel planında değişikliklere yol açan her hata düzeltmesi uyumluluğu bozar. +* Sonuç olarak, farklı optimizer planlarına yol açan türdeki değişiklikler şimdilik yalnızca ana sürümlerde birleştirilebilir. + + +#### Scala / Java birlikte çalışabilirliği (eski kod parçaları) + +Arayüzleri tasarlarken Java'yı aklınızda tutun. + +* Bir sınıfın gelecekte bir Java sınıfıyla etkileşime girip girmeyeceğini düşünün. +* Java kodu ile sorunsuz entegrasyon için arayüzlerde Java koleksiyonları ve Java Optional kullanın. +* Bir sınıf Java'ya dönüştürülmeye tabi tutulacaksa, yapım için .copy() veya apply() gibi case class'ların özelliklerini kullanmayın. +* Saf Scala kullanıcı odaklı API'ler, Scala ile doğal ve idiomatik ("scalaesk") entegrasyon için saf Scala koleksiyonları/yinelenebilirleri/vb. kullanmalıdır. + + diff --git a/docs/content.tr/how-to-contribute/code-style-and-quality-formatting.md b/docs/content.tr/how-to-contribute/code-style-and-quality-formatting.md new file mode 100644 index 0000000000..e55fc2bd68 --- /dev/null +++ b/docs/content.tr/how-to-contribute/code-style-and-quality-formatting.md @@ -0,0 +1,132 @@ +--- +title: Kod Stili ve Kalite Kılavuzu — Biçimlendirme Kılavuzu +bookCollapseSection: false +bookHidden: true +--- + +# Kod Stili ve Kalite Kılavuzu — Biçimlendirme Kılavuzu + +#### [Önsöz]({{< relref "how-to-contribute/code-style-and-quality-preamble" >}}) +#### [Pull Request'ler ve Değişiklikler]({{< relref "how-to-contribute/code-style-and-quality-pull-requests" >}}) +#### [Genel Kodlama Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-common" >}}) +#### [Java Dili Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-java" >}}) +#### [Scala Dili Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-scala" >}}) +#### [Bileşenler Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-components" >}}) +#### [Biçimlendirme Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-formatting" >}}) + +## Java Kodu Biçimlendirme Stili + +IDE'yi otomatik olarak kod stilini kontrol edecek şekilde ayarlamanızı öneriyoruz. Lütfen {{< docs_link file="flink-docs-stable/docs/flinkdev/ide_setup/" name="IDE Kurulum Kılavuzu">}} sayfasını takip ederek +{{< docs_link file="flink-docs-stable/docs/flinkdev/ide_setup/#code-formatting" name="spotless">}} ve +{{< docs_link file="flink-docs-stable/docs/flinkdev/ide_setup/#checkstyle-for-java" name="checkstyle">}} araçlarını kurun. + +### Lisans + +* **Apache lisans başlıkları.** Dosyalarınızda Apache Lisans başlıklarının olduğundan emin olun. Kodu derlediğinizde RAT eklentisi bunu kontrol eder. + +### Import İfadeleri + +* **Paket bildirimi öncesinde ve sonrasında boş satır.** +* **Kullanılmayan import ifadeleri olmamalı.** +* **Gereksiz import ifadeleri olmamalı.** +* **Wildcard import ifadeleri olmamalı.** Bunlar, koda ekleme yaparken ve bazı durumlarda yeniden düzenleme sırasında bile sorunlara neden olabilir. +* **Import sıralaması.** Import ifadeleri alfabetik olarak sıralanmalı, aşağıdaki bloklara ayrılmalı ve her blok arasında boş bir satır olmalıdır: + * <org.apache.flink.* 'den importlar> + * <org.apache.flink.shaded.* 'den importlar> + * <diğer kütüphanelerden importlar> + * <javax.* 'den importlar> + * <java.* 'den importlar> + * <scala.* 'den importlar> + * <static importlar> + + +### İsimlendirme + +* **Paket adları bir harfle başlamalı ve büyük harf veya özel karakterler içermemelidir.** + **Non-private static final alanlar büyük harf olmalı ve kelimeler alt çizgilerle ayrılmalıdır.**(`MY_STATIC_VARIABLE`) +* **Non-static alanlar/metodlar küçük deve (camel) case olmalıdır.** (`myNonStaticField`) + + +### Boşluklar + +* **Tablar ve boşluklar.** Girinti için tab yerine boşluk kullanıyoruz. +* **Satır sonunda boşluk olmamalı.** +* **Operatörler/anahtar kelimeler etrafında boşluklar.** Operatörler (`+`, `=`, `>`, …) ve anahtar kelimeler (`if`, `for`, `catch`, …) satırın başında veya sonunda olmadıkları sürece önlerinde ve arkalarında bir boşluk olmalıdır. + + +### Uzun İfadelerde Satır Düzenleme Kuralları + +Genel olarak, kod okunabilirliğini artırmak için uzun satırlardan kaçınılmalıdır. Aynı soyutlama düzeyinde çalışan kısa ifadeler kullanmaya çalışın. Uzun ifadeleri, daha fazla yerel değişken tanımlayarak, yardımcı metodlar oluşturarak vb. yollarla kısaltın. + +Uzun satırların iki temel kaynağı şunlardır: + +* **Fonksiyon tanımında veya çağrısında uzun argüman listesi**: `void func(type1 arg1, type2 arg2, ...)` +* **Uzun zincirleme metod çağrı dizisi**: `list.stream().map(...).reduce(...).collect(...)...` + +Uzun satırları kırma kuralları: + +* Satır uzunluk sınırını aşıyorsa veya kırmanın kod okunabilirliğini artıracağını düşünüyorsanız argüman listesini veya çağrı zincirini kırın +* Bir satırı kırdığınızda, ilk argüman/çağrı da dahil olmak üzere her argüman/çağrı ayrı bir satırda olmalıdır +* Her yeni satır, üst fonksiyon adının veya çağrılan öğenin satırına göre bir ek girinti (fonksiyon tanımı için iki) içermelidir + +Fonksiyon argümanları için ek kurallar: + +* Açılış parantezi her zaman üst fonksiyon adının bulunduğu satırda kalır +* Olası fırlatılan istisna listesi asla kırılmaz ve satır uzunluğu limitini aşsa bile aynı son satırda kalır +* Son argüman hariç, fonksiyon argümanının bulunduğu her satır aynı satırda kalan bir virgülle bitmelidir + +Fonksiyon argümanları listesini kırma örneği: + +``` +public void func( + int arg1, + int arg2, + ...) throws E1, E2, E3 { + +} +``` + +Zincirleme bir çağrıda nokta işareti her zaman, o zincirleme çağrının kendi satırında, çağrının başında yer alır. + +Zincirleme çağrılar listesini kırma örneği: + +``` +values + .stream() + .map(...) + .collect(...); +``` + + +### Süslü Parantezler + +* **Sol süslü parantezler ({) yeni bir satıra yerleştirilmemelidir.** +* Sağ süslü parantezler (}) her zaman satırın başına yerleştirilmelidir. +* Bloklar. if, for, while, do, … gibi ifadelerden sonra gelen tüm ifadeler, her zaman süslü parantezlerle bir blok içinde kapsüllenmelidir (blok bir ifade içerse bile). + + +### Javadoc'lar + +* **Tüm public/protected metodlar ve sınıflar bir Javadoc'a sahip olmalıdır.** +* **Javadoc'un ilk cümlesi bir nokta ile bitmelidir.** +* **Paragraflar yeni bir satırla ayrılmalı ve

ile başlamalıdır.** + + +### Erişim Belirteçleri (Modifiers) + +* **Gereksiz erişim belirteçleri olmamalı.** Örneğin, interface metodlarında public erişim belirteçleri. +* **JLS3 erişim belirteci sıralamasını takip edin.** Erişim belirteçleri şu sırayla düzenlenmelidir: public, protected, private, abstract, static, final, transient, volatile, synchronized, native, strictfp. + + +### Dosyalar + +* **Tüm dosyalar \n ile bitmelidir.** +* Dosya uzunluğu 3000 satırı geçmemelidir. + + +### Çeşitli + +* **Diziler Java tarzında tanımlanmalıdır.** Örneğin, `public String[] array`. +* **Flink Preconditions kullanın.** Homojenliği artırmak için, Apache Commons Validate veya Google Guava yerine tutarlı bir şekilde `org.apache.flink.Preconditions` metodları olan `checkNotNull` ve `checkArgument` kullanın. + +[^1]: Bu tür framework'leri hata ayıklamayı kolaylaştırmak ve bağımlılık çakışmalarını önlemek için Flink'in dışında tutuyoruz. diff --git a/docs/content.tr/how-to-contribute/code-style-and-quality-java.md b/docs/content.tr/how-to-contribute/code-style-and-quality-java.md new file mode 100644 index 0000000000..99758a29bb --- /dev/null +++ b/docs/content.tr/how-to-contribute/code-style-and-quality-java.md @@ -0,0 +1,120 @@ +--- +title: Kod Stili ve Kalite Kılavuzu — Java +bookCollapseSection: false +bookHidden: true +--- + +# Kod Stili ve Kalite Kılavuzu — Java + +#### [Önsöz]({{< relref "how-to-contribute/code-style-and-quality-preamble" >}}) +#### [Pull Request'ler ve Değişiklikler]({{< relref "how-to-contribute/code-style-and-quality-pull-requests" >}}) +#### [Genel Kodlama Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-common" >}}) +#### [Java Dili Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-java" >}}) +#### [Scala Dili Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-scala" >}}) +#### [Bileşenler Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-components" >}}) +#### [Biçimlendirme Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-formatting" >}}) + +## Java Dili Özellikleri ve Kütüphaneleri + + +### Ön Koşullar ve Loglama İfadeleri + +* Parametrelerde asla dizeleri birleştirmeyin + * Yapmayın: `Preconditions.checkState(value <= threshold, "value must be below " + threshold)` + * Yapmayın: `LOG.debug("value is " + value)` + * Yapın: `Preconditions.checkState(value <= threshold, "value must be below %s", threshold)` + * Yapın: `LOG.debug("value is {}", value)` + + +### Generics + +* **Raw type kullanmayın:** Kesinlikle gerekli olmadıkça raw type kullanmayın (bazen imza eşleşmeleri, diziler için gereklidir). +* **Kontrol edilmemiş dönüşümler için uyarıları bastırın:** Kaçınılamıyorsa uyarıları bastırmak için annotation ekleyin (örneğin "unchecked" veya "serial"). Aksi takdirde, generics hakkındaki uyarılar yapıyı doldurur ve ilgili uyarıları boğar. + + +### equals() / hashCode() + +* **equals() / hashCode() yalnızca iyi tanımlandıklarında eklenmelidir.** +* İyi tanımlanmadıkları zaman **testlerde daha basit bir assertion sağlamak için eklenmemelidir**. Bu durumda hamcrest matcher'larını kullanın: [https://github.com/junit-team/junit4/wiki/matchers-and-assertthat](https://github.com/junit-team/junit4/wiki/matchers-and-assertthat) +* Yöntemlerin iyi tanımlanmadığının yaygın bir göstergesi, alanların bir alt kümesini dikkate almalarıdır (tamamen yardımcı alanlar dışında). +* Yöntemler mutable alanları dikkate aldığında, genellikle bir tasarım sorununuz vardır. `equals()`/`hashCode()` yöntemleri, türü bir anahtar olarak kullanmayı önerir, ancak imzalar türü değiştirmeye devam etmenin güvenli olduğunu gösterir. + + +### Java Serialization + +* **Hiçbir şey için Java Serialization kullanmayın !!!** +* **Hiçbir şey için Java Serialization kullanmayın !!! !!!** +* **Hiçbir şey için Java Serialization kullanmayın !!! !!! !!!** +* Flink içinde, Java serialization, mesajları ve programları RPC aracılığıyla taşımak için kullanılır. Bu, Java serialization kullandığımız tek durumdur. Bu nedenle, bazı sınıfların serializable olması gerekir (RPC aracılığıyla taşınıyorlarsa). +* **Serializable sınıflar bir Serial Version UID tanımlamalıdır:** + + `private static final long serialVersionUID = 1L;` +* **Yeni sınıflar için Serial Version UID 1'den başlamalıdır** ve genellikle Java serialization uyumluluğu tanımına göre sınıfta yapılan her uyumsuz değişiklikte artırılmalıdır (örneğin: bir alanın türünü değiştirmek veya bir sınıfı sınıf hiyerarşisinde taşımak). + + +### Java Reflection + +**Java'nın Reflection API'sini kullanmaktan kaçının** + +* Java'nın Reflection API'si belirli durumlarda çok kullanışlı bir araç olabilir, ancak her durumda bir hack'tir ve alternatifler araştırılmalıdır. Flink'in reflection kullanması gereken tek durumlar şunlardır: + * Başka bir modülden dinamik olarak implementasyonları yükleme (web UI, ek serializer'lar, pluggable query processor'lar gibi). + * TypeExtractor sınıfı içinde türleri çıkarma. Bu yeterince kırılgandır ve TypeExtractor sınıfının dışında yapılmamalıdır. + * Bir sınıfın/metodun tüm sürümlerde bulunduğunu varsayamadığımız için reflection kullanmamız gereken, JDK sürümleri arası özelliklerin bazı durumları. +* Testlerde metodlara veya alanlara erişmek için reflection'a ihtiyacınız varsa, bu genellikle daha derin mimari sorunlara işaret eder, örneğin yanlış scoping, ilgi alanlarının kötü ayrımı veya test edilen sınıfa component'ler/dependency'ler sağlamanın temiz bir yolunun olmadığı durumlar. + + +### Collections + +* **ArrayList ve ArrayDeque, listenin ortasında sık sık ekleme ve silme yapıldığı durumlar dışında neredeyse her zaman LinkedList'ten üstündür.** +* **Map'ler için, birden çok lookup gerektiren pattern'lardan kaçının** + * `get()` öncesinde `contains()` → `get()` ve null kontrolü + * `put()` öncesinde `contains()` → `putIfAbsent()` veya `computeIfAbsent()` + * Key'ler üzerinde yineleme, value'ları alma → `entrySet()` üzerinde yineleme +* **Bir collection için initial capacity'i yalnızca bunun için iyi kanıtlanmış bir neden varsa ayarlayın**, aksi takdirde kodu karıştırmayın. **Map'ler** durumunda bu daha da yanıltıcı olabilir çünkü Map'in load factor'ü etkili bir şekilde capacity'i azaltır. + + +### Java Optional + +* Nullable değerler için `Optional` kullanmadığınız yerlerde **@Nullable annotation kullanın**. +* `Optional` kullanımının kritik kodda **performance degradation'a yol açacağını kanıtlayabiliyorsanız, @Nullable'a fallback yapın**. +* Kanıtlanmış bir performans endişesi durumu dışında, API/public method'larda **nullable değerleri döndürmek için her zaman Optional kullanın**. +* Bunun yerine ya metodu overload edin ya da fonksiyon argümanları seti için Builder pattern kullanın, **Optional'ı bir fonksiyon argümanı olarak kullanmayın**. + * Not: Kodun basitleştirildiğine inanıyorsanız, private helper method'da bir Optional argümanına izin verilebilir + ([örnek](https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroFactory.java#L95)). +* **Class field'ları için Optional kullanmayın**. + + +### Lambda Expressions + +* Non-capturing lambda'ları tercih edin (dış scope'daki referansları içermeyen lambda'lar). Capturing lambda'lar her çağrı için yeni bir nesne instance'ı oluşturmalıdır. Non-capturing lambda'lar, her invocation için aynı instance'ı kullanabilir. + + **yapmayın:** + ``` + map.computeIfAbsent(key, x -> key.toLowerCase()) + ``` + + **yapın:** + ``` + map.computeIfAbsent(key, k -> k.toLowerCase()); + ``` + +* Inline lambda'lar yerine method reference'ları düşünün + + **yapmayın**: + ``` + map.computeIfAbsent(key, k-> Loader.load(k)); + ``` + + **yapın:** + ``` + map.computeIfAbsent(key, Loader::load); + ``` + + +### Java Streams + +* Performance-critical olan herhangi bir kodda Java Streams kullanmaktan kaçının. +* Java Streams kullanmanın ana motivasyonu, kod readability'sini geliştirmek olmalıdır. Bu nedenle, data-intensive olmayan, ancak koordinasyonla ilgilenen kod parçaları için iyi bir match olabilirler. +* İkinci durumda bile, scope'u bir metod ile veya bir internal class içindeki birkaç private metod ile sınırlamaya çalışın. + + diff --git a/docs/content.tr/how-to-contribute/code-style-and-quality-preamble.md b/docs/content.tr/how-to-contribute/code-style-and-quality-preamble.md new file mode 100644 index 0000000000..7537fa0420 --- /dev/null +++ b/docs/content.tr/how-to-contribute/code-style-and-quality-preamble.md @@ -0,0 +1,31 @@ +--- +title: Kod Stili ve Kalite Kılavuzu +bookCollapseSection: false +weight: 19 +--- + +# Apache Flink Kod Stili ve Kalite Kılavuzu + +#### [Önsöz]({{< relref "how-to-contribute/code-style-and-quality-preamble" >}}) +#### [Pull Request'ler ve Değişiklikler]({{< relref "how-to-contribute/code-style-and-quality-pull-requests" >}}) +#### [Genel Kodlama Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-common" >}}) +#### [Java Dili Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-java" >}}) +#### [Scala Dili Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-scala" >}}) +#### [Bileşenler Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-components" >}}) +#### [Biçimlendirme Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-formatting" >}}) + +


+ +Bu, sürdürmek istediğimiz kod ve kalite standardını yakalama girişimidir. + +Bir kod katkısı (veya herhangi bir kod parçası) çeşitli şekillerde değerlendirilebilir: Özellik kümelerinden biri, kodun doğru ve verimli olup olmadığıdır. Bu, _mantıksal veya algoritmik problemi_ doğru ve iyi bir şekilde çözmeyi gerektirir. + +Diğer bir özellik kümesi ise, kodun sezgisel bir tasarım ve mimariyi takip edip etmediği, iyi yapılandırılmış ve doğru ilgi ayrımına sahip olup olmadığı ve kodun kolayca anlaşılabilir olup olmadığı ve varsayımlarını açık hale getirip getirmediğidir. Bu özellik kümesi, _yazılım mühendisliği problemini_ iyi çözmeyi gerektirir. İyi bir çözüm, kodun kolayca test edilebilir, orijinal yazarlarından başka kişiler tarafından da bakımının yapılabilir (çünkü yanlışlıkla bozmak daha zordur) ve geliştirmek için verimli olduğu anlamına gelir. + +İlk özellik kümesinin oldukça nesnel onay kriterleri varken, ikinci özellik kümesini değerlendirmek çok daha zordur, ancak Apache Flink gibi bir açık kaynak projesi için büyük önem taşır. Kod tabanını birçok katkıda bulunana davet etmek, katkıları orijinal kodu yazmayan geliştiriciler için anlaşılması kolay hale getirmek ve kodu birçok katkı karşısında sağlam tutmak için, iyi tasarlanmış kod çok önemlidir.[^1] İyi tasarlanmış kod için, zaman içinde doğru ve hızlı kalmasını sağlamak daha kolaydır. + +Bu elbette iyi tasarlanmış kod nasıl yazılır konusunda tam bir kılavuz değildir. Bunu yakalamaya çalışan büyük kitapların dünyası var. Bu kılavuz, Flink'i geliştirme bağlamında gözlemlediğimiz en iyi uygulamaların, desenlerin, anti-desenlerin ve yaygın hataların bir kontrol listesi olarak düşünülmüştür. + +Yüksek kaliteli açık kaynak katkılarının büyük bir kısmı, inceleyicinin katkıyı anlamasına ve etkileri çift kontrol etmesine yardımcı olmakla ilgilidir, bu nedenle bu kılavuzun önemli bir kısmı, bir pull request'i inceleme için nasıl yapılandıracağınızla ilgilidir. + +[^1]: Daha önceki günlerde, biz (Flink topluluğu) buna her zaman yeterince dikkat etmedik, bu da Flink'in bazı bileşenlerinin geliştirilmesini ve katkıda bulunulmasını zorlaştırdı. diff --git a/docs/content.tr/how-to-contribute/code-style-and-quality-pull-requests.md b/docs/content.tr/how-to-contribute/code-style-and-quality-pull-requests.md new file mode 100644 index 0000000000..b3da75996d --- /dev/null +++ b/docs/content.tr/how-to-contribute/code-style-and-quality-pull-requests.md @@ -0,0 +1,100 @@ +--- +title: Kod Stili ve Kalite Kılavuzu — Pull Request'ler ve Değişiklikler +bookCollapseSection: false +bookHidden: true +--- + +# Kod Stili ve Kalite Kılavuzu — Pull Request'ler ve Değişiklikler + +#### [Önsöz]({{< relref "how-to-contribute/code-style-and-quality-preamble" >}}) +#### [Pull Request'ler ve Değişiklikler]({{< relref "how-to-contribute/code-style-and-quality-pull-requests" >}}) +#### [Genel Kodlama Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-common" >}}) +#### [Java Dili Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-java" >}}) +#### [Scala Dili Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-scala" >}}) +#### [Bileşenler Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-components" >}}) +#### [Biçimlendirme Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-formatting" >}}) + +
+ +**Gerekçe:** Katkıda bulunanlara, pull request'leri daha kolay ve daha kapsamlı bir şekilde incelenebilecekleri bir duruma getirmek için biraz ekstra çaba harcamalarını istiyoruz. Bu, topluluğa birçok açıdan yardımcı olur: + +* İncelemeler çok daha hızlıdır ve böylece katkılar daha erken birleştirilir. +* Katkılardaki daha az sorunu gözden kaçırarak daha yüksek kod kalitesini sağlayabiliriz. +* Committer'lar aynı sürede daha fazla katkıyı inceleyebilir, bu da Flink'in yaşadığı yüksek katkı oranıyla başa çıkmaya yardımcı olur. + +Lütfen bu kılavuzu takip etmeyen katkıların incelenmesinin daha uzun süreceğini ve bu nedenle genellikle topluluk tarafından daha düşük öncelikle ele alınacağını anlayın. Bu kötü niyet değil, yapılandırılmamış Pull Request'leri incelemenin eklediği karmaşıklıktan kaynaklanmaktadır. + + +## 1. JIRA Sorunu ve İsimlendirme + +Pull request'in bir [JIRA sorununa](https://issues.apache.org/jira/projects/FLINK/issues) karşılık geldiğinden emin olun. + +İstisnalar, JavaDoc'lardaki veya dokümantasyon dosyalarındaki yazım hatalarının düzeltilmesi gibi ****acil düzeltmelerdir****. + + +Pull request'i `[FLINK-XXXX][bileşen] Pull request'in başlığı` şeklinde adlandırın, burada `FLINK-XXXX` gerçek sorun numarasıyla değiştirilmelidir. Bileşenler, JIRA sorununda kullanılanlarla aynı olmalıdır. + +Acil düzeltmeler, örneğin `[hotfix][docs] Olay zamanı tanıtımındaki yazım hatasını düzelt` veya `[hotfix][javadocs] PuncuatedWatermarkGenerator için JavaDoc'u genişlet` şeklinde adlandırılmalıdır. + + +## 2. Açıklama + +Katkıyı tanımlamak için lütfen pull request şablonunu doldurun. Lütfen inceleyen kişinin sorunu ve çözümü yalnızca koddan değil, açıklamadan da anlamasını sağlayacak şekilde açıklayın. + +İyi açıklanmış bir pull request'in mükemmel bir örneği [https://github.com/apache/flink/pull/7264](https://github.com/apache/flink/pull/7264) 'tür. + +Açıklamanın PR tarafından çözülen problem için yeterli olduğundan emin olun. Küçük değişiklikler bir duvar metni gerektirmez. İdeal durumlarda, sorun Jira sorunununda açıklanmıştır ve açıklama büyük ölçüde oradan kopyalanabilir. + +Uygulama sırasında ek açık sorular/sorunlar keşfedildiyse ve bunlarla ilgili bir seçim yaptıysanız, bunları pull request metninde açıklayın, böylece inceleyenler varsayımları iki kez kontrol edebilsin. Bir örnek [https://github.com/apache/flink/pull/8290](https://github.com/apache/flink/pull/8290) (Bölüm "Açık Mimari Soruları") içinde bulunabilir. + + +## 3. Refactoring, Temizleme ve Bağımsız Değişiklikleri Ayırma + +****NOT: Bu bir optimizasyon değil, kritik bir gerekliliktir.**** + +Pull Request'ler temizlik, refactoring ve temel değişiklikleri ayrı commit'lere koymalıdır. Bu şekilde, inceleyici bağımsız olarak temizlik ve refactoring'e bakabilir ve bu değişikliklerin davranışı değiştirmediğinden emin olabilir. Ardından, inceleyici temel değişikliklere izole olarak (diğer değişikliklerin gürültüsü olmadan) bakabilir ve bunun temiz ve sağlam bir değişiklik olduğundan emin olabilir. + +Kesinlikle ayrı bir commit'e gitmesi gereken değişiklik örnekleri şunları içerir: + +* Önceden var olan koddaki temizlik, stil ve uyarıları düzeltme +* Paketleri, sınıfları veya yöntemleri yeniden adlandırma +* Kodu taşıma (diğer paketlere veya sınıflara) +* Yapıyı refactoring veya tasarım desenlerini değiştirme +* İlgili testleri veya yardımcı programları birleştirme +* Mevcut testlerdeki varsayımları değiştirme (değiştirilen varsayımların neden mantıklı olduğunu açıklayan bir commit mesajı ekleyin). + +Aynı PR'nin önceki commit'lerinde tanıtılan sorunları düzelten temizleme commit'leri olmamalıdır. Commit'ler kendi içinde temiz olmalıdır. + +Ek olarak, herhangi bir daha büyük katkı, değişiklikleri bağımsız olarak incelenebilecek bir dizi bağımsız değişikliğe ayırmalıdır. + +Sorunları ayrı commit'lere bölmenin iki harika örneği şunlardır: + +* [https://github.com/apache/flink/pull/6692](https://github.com/apache/flink/pull/6692) (temizleme ve refactoring'i ana değişikliklerden ayırır) +* [https://github.com/apache/flink/pull/7264](https://github.com/apache/flink/pull/7264) (ayrıca ana değişiklikleri bağımsız olarak incelenebilir parçalara ayırır) + +Bir pull request hala büyük commit'ler içeriyorsa (örneğin, 1000'den fazla değiştirilen satırı olan bir commit), yukarıdaki örnekte olduğu gibi commit'i birden çok alt probleme nasıl böleceğinizi düşünmek faydalı olabilir. + + +## 4. Commit İsimlendirme Kuralları + +Commit mesajları, pull request'in tamamına benzer bir modeli takip etmelidir: +`[FLINK-XXXX][bileşen] Commit açıklaması`. + +Bazı durumlarda, sorun burada bir alt görev olabilir ve bileşen Pull Request'in ana bileşeninden farklı olabilir. Örneğin, commit bir çalışma zamanı değişikliği için uçtan uca bir test getirdiğinde, PR `[runtime]` olarak etiketlenecektir, ancak bireysel commit `[e2e]` olarak etiketlenecektir. + +Commit mesajları için örnekler: + +* `[hotfix] Sürüm son eklerine izin vermek için update_branch_version.sh düzeltildi` +* `[hotfix] [table] Kullanılmayan geometri bağımlılığını kaldır` +* `[FLINK-11704][tests] AbstractCheckpointStateOutputStreamTestBase'i geliştir` +* `[FLINK-10569][runtime] ExecutionVertexCancelTest'te Instance kullanımını kaldır` +* `[FLINK-11702][table-planner-blink] Yeni bir tablo tip sistemi tanıt` + + +## 5. Sistemin Gözlemlenebilir Davranışındaki Değişiklikler + +Katkıda bulunanlar, PR'lerinde Flink'in gözlemlenebilir davranışını herhangi bir şekilde bozan değişikliklerin farkında olmalıdır, çünkü birçok durumda bu tür değişiklikler mevcut kurulumları bozabilir. Kodlama sırasında veya incelemelerde bu sorunla ilgili olarak soru işaretleri uyandırması gereken kırmızı bayraklar, örneğin: + +* Bozan değişiklikle testlerin tekrar geçmesini sağlamak için iddialar değiştirilmiştir. +* Mevcut testlerin geçmeye devam etmesi için yapılandırma ayarının aniden (varsayılan olmayan) değerlere ayarlanması gerekir. Bu, özellikle bozucu bir varsayılana sahip yeni ayarlar için olabilir. +* Mevcut komut dosyalarının veya yapılandırmaların ayarlanması gerekir. diff --git a/docs/content.tr/how-to-contribute/code-style-and-quality-scala.md b/docs/content.tr/how-to-contribute/code-style-and-quality-scala.md new file mode 100644 index 0000000000..11bceb2b56 --- /dev/null +++ b/docs/content.tr/how-to-contribute/code-style-and-quality-scala.md @@ -0,0 +1,82 @@ +--- +title: Kod Stili ve Kalite Kılavuzu — Scala +bookCollapseSection: false +bookHidden: true +--- + +# Kod Stili ve Kalite Kılavuzu — Scala + +#### [Önsöz]({{< relref "how-to-contribute/code-style-and-quality-preamble" >}}) +#### [Pull Request'ler ve Değişiklikler]({{< relref "how-to-contribute/code-style-and-quality-pull-requests" >}}) +#### [Genel Kodlama Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-common" >}}) +#### [Java Dili Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-java" >}}) +#### [Scala Dili Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-scala" >}}) +#### [Bileşenler Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-components" >}}) +#### [Biçimlendirme Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-formatting" >}}) + +## Scala Dili Özellikleri + +### Scala'nın Kullanılacağı (ve Kullanılmayacağı) Yerler + +**Scala'yı Scala API'leri veya saf Scala kütüphaneleri için kullanırız.** + +**Temel API'lerde ve runtime bileşenlerinde Scala kullanmıyoruz. Bu bileşenlerden mevcut Scala kullanımını (kod ve dependency'ler) kaldırmayı hedefliyoruz.** + +⇒ Bu Scala'yı sevmediğimizden değil, "doğru iş için doğru araç" yaklaşımının bir sonucudur (aşağıya bakın). + +API'ler için, temeli Java'da geliştiririz ve Scala'yı üzerine katmanlarız. + +* Bu geleneksel olarak hem Java hem de Scala için en iyi birlikte çalışabilirliği sağlamıştır +* Bu, Scala API'sini güncel tutmak için özel çaba gerektiği anlamına gelir + +Neden temel API'lerde ve runtime'da Scala kullanmıyoruz? + +* Geçmiş göstermiştir ki Scala, işlevsellikte zor değişikliklerle çok hızlı gelişmektedir. Her Scala sürüm yükseltmesi, Flink topluluğu için oldukça büyük bir çaba gerektiren bir süreçti. +* Scala her zaman Java sınıflarıyla iyi etkileşim kurmaz, örneğin Scala'nın görünürlük kapsamları farklı çalışır ve genellikle Java kullanıcılarına istenilenin ötesinde daha fazla erişim sağlar +* Scala, artifact/dependency yönetimine ek bir karmaşıklık katmanı ekler. + * Runtime'da Akka gibi Scala'ya bağlı kütüphaneleri tutmak isteyebiliriz, ancak bunları bir arayüz aracılığıyla soyutlamak ve ayrı bir classloader'da yüklemek, bunları korumalı tutmak ve sürüm çakışmalarını önlemek için gerekebilir. +* Scala, bilgili Scala programcılarının, Scala konusunda daha az bilgili programcıların anlaması çok zor olan kodlar yazmasını çok kolaylaştırır. Bu, çeşitli deneyim seviyelerine sahip geniş bir topluluğa sahip bir açık kaynak projesi için özellikle zordur. Bununla başa çıkmak, Scala özellik setini büyük ölçüde kısıtlamak anlamına gelir, bu da Scala'yı kullanmanın asıl amacının önemli bir kısmını engeller. + + +### API Eşitliği + +Java API ve Scala API'yi işlevsellik ve kod kalitesi açısından senkronize tutun. + +Scala API, Java API'lerinin tüm özelliklerini de kapsamalıdır. + +Scala API'leri, DataStream API'den aşağıdaki örnek gibi bir "tamlık testi"ne sahip olmalıdır: [https://github.com/apache/flink/blob/master/flink-streaming-scala/src/test/scala/org/apache/flink/streaming/api/scala/StreamingScalaAPICompletenessTest.scala](https://github.com/apache/flink/blob/master/flink-streaming-scala/src/test/scala/org/apache/flink/streaming/api/scala/StreamingScalaAPICompletenessTest.scala) + + +### Dil Özellikleri + +* **Scala implicit'lerinden kaçının.** + * Scala'nın implicit'leri sadece Table API expression'ları veya type information extraction gibi kullanıcı odaklı API iyileştirmeleri için kullanılmalıdır. + * Bunları dahili "sihir" için kullanmayın. +* **Class üyeleri için açık tip belirtin.** + * Class field'ları ve method dönüş tipleri için implicit tip çıkarımına güvenmeyin: + + **Yapmayın:** + ``` + var expressions = new java.util.ArrayList[String]() + ``` + + **Yapın:** + ``` + var expressions: java.util.List[String] = new java.util.ArrayList[]() + ``` + + * Stack'teki yerel değişkenler için tip çıkarımı kullanmak sorun değildir. +* **Katı görünürlük kullanın.** + * Scala'nın paket özel özelliklerinden (private[flink] gibi) kaçının ve bunun yerine normal private/protected kullanın. + * `private[flink]` ve `protected` üyelerin Java'da public olduğunu unutmayın. + * `private[flink]`'in hala Flink tarafından sağlanan örneklerde tüm üyeleri açığa çıkardığını unutmayın. + + +### Kod Biçimlendirme + +**Kodunuzu yapılandırmak için satır kaydırmayı kullanın.** + +* Scala'nın fonksiyonel doğası, uzun dönüşüm zincirlerine izin verir (`x.map().map().foreach()`). +* Geliştiricileri kodlarını yapılandırmaya zorlamak için, satır uzunluğu 100 karakterle sınırlıdır. +* Daha iyi bakım yapılabilirlik için dönüşüm başına bir satır kullanın. + diff --git a/docs/content.tr/how-to-contribute/contribute-code.md b/docs/content.tr/how-to-contribute/contribute-code.md new file mode 100644 index 0000000000..a9fa54f71d --- /dev/null +++ b/docs/content.tr/how-to-contribute/contribute-code.md @@ -0,0 +1,236 @@ +--- +title: Kod Katkısında Bulunma +bookCollapseSection: false +weight: 17 +--- + +# Kod Katkısında Bulunma + +Apache Flink, gönüllülerin kod katkılarıyla bakımı yapılan, geliştirilen ve genişletilen bir projedir. Flink'e yapılan katkıları memnuniyetle karşılıyoruz, ancak projenin büyüklüğü ve kod tabanının yüksek kalitesini korumak için bu belgede açıklanan bir katkı sürecini takip ediyoruz. + +**Lütfen istediğiniz zaman soru sormaktan çekinmeyin.** [Geliştirici e-posta listesine]({{< relref "community" >}}#mailing-lists) bir e-posta gönderin veya üzerinde çalıştığınız Jira sorununa yorum yapın. + +**ÖNEMLİ**: Kod katkısında bulunmaya başlamadan önce lütfen bu belgeyi dikkatlice okuyun. Aşağıda açıklanan süreci ve yönergeleri izleyin. Apache Flink'e katkıda bulunmak, bir pull request açmakla başlamaz. Katkıda bulunanların önce bizimle iletişime geçerek genel yaklaşımı birlikte tartışmalarını bekliyoruz. Flink committer'ları ile fikir birliği olmadan, katkılar önemli ölçüde yeniden çalışma gerektirebilir veya incelenmeyecektir. + +## Ne katkıda bulunacağınızı arıyorsunuz + +Katkı için iyi bir fikriniz varsa, [kod katkı sürecine](#code-contribution-process) geçebilirsiniz. +Neye katkıda bulunabileceğinizi arıyorsanız, [Flink'in hata izleyicisinde]({{< relref "community" >}}#issue-tracker) atanmamış açık Jira sorunlarına göz atabilir ve ardından [kod katkı sürecini](#code-contribution-process) takip edebilirsiniz. Flink projesine çok yeniyseniz ve proje ve katkı süreci hakkında bilgi edinmek istiyorsanız, _starter_ etiketi ile işaretlenmiş [başlangıç sorunlarını](https://issues.apache.org/jira/issues/?filter=12349196) kontrol edebilirsiniz. + +## Kod Katkı Süreci + + + + + + + +
+
+
+
+

1Tartış

+

Bir Jira bileti veya e-posta listesi tartışması oluşturun ve fikir birliğine varın

+

Biletin önemi, ilgisi, kapsamı konusunda anlaşın, uygulama yaklaşımını tartışın ve değişikliği incelemeye ve birleştirmeye istekli bir committer bulun.

+

Jira biletlerini yalnızca committer'lar atayabilir.

+
+
+
+
+
+
+

2Uygula

+

Değişikliği }}">Kod Stili ve Kalite Kılavuzu'na ve Jira biletinde üzerinde anlaşılan yaklaşıma göre uygulayın.


+

Yaklaşım konusunda fikir birliği varsa (örneğin bilet size atanmışsa) uygulamaya başlayın

+
+
+
+
+
+
+

3İnceleme

+

Bir pull request açın ve inceleyici ile çalışın.

+

Atanmamış Jira biletlerine ait veya atanan kişi tarafından yazılmamış pull request'ler topluluk tarafından incelenmeyecek veya birleştirilmeyecektir.

+
+
+
+
+
+
+

4Birleştir

+

Flink'in bir committer'ı, katkının gereksinimleri karşılayıp karşılamadığını kontrol eder ve kodu kod tabanına birleştirir.

+
+
+
+
+ +
+
+
+
+ Not: Yazım hataları veya sözdizimi hataları gibi önemsiz acil düzeltmeler, Jira bileti olmadan [hotfix] pull request'i olarak açılabilir. +
+
+
+
+ + + + + +### 1. Jira Bileti Oluşturun ve Fikir Birliğine Varın + + +Apache Flink'e katkıda bulunmanın ilk adımı, Flink topluluğuyla fikir birliğine varmaktır. Bu, bir değişikliğin kapsamı ve uygulama yaklaşımı konusunda anlaşmak anlamına gelir. + +Çoğu durumda tartışma, [Flink'in hata izleyicisi: Jira]({{< relref "community" >}}#issue-tracker)'da gerçekleşmelidir. + +Aşağıdaki değişiklik türleri, [Flink Geliştirici e-posta listesinde]({{< relref "community" >}}#mailing-lists) bir `[DISCUSS]` konusu gerektirir: + +- büyük değişiklikler (önemli yeni özellik; büyük yeniden düzenlemeler, birden fazla bileşeni içeren) +- potansiyel olarak tartışmalı değişiklikler veya konular +- yaklaşımların belirsiz olduğu veya birden fazla eşit yaklaşımın olduğu değişiklikler + +Tartışma bir sonuca varmadan önce bu tür değişiklikler için bir Jira bileti açmayın. +Bir dev@ tartışmasına dayalı Jira biletlerinin o tartışmaya bağlantı vermesi ve sonucu özetlemesi gerekir. + + + +**Bir Jira biletinin fikir birliğine varması için gereksinimler:** + +- Resmi gereksinimler + - *Başlık* sorunu kısaca açıklar. + - *Açıklama*, sorunu veya özellik isteğini anlamak için gereken tüm ayrıntıları verir. + - *Bileşen* alanı ayarlanmıştır: Birçok committer ve katkıda bulunan sadece Flink'in belirli alt sistemlerine odaklanır. Uygun bileşeni ayarlamak, dikkatlerini çekmek için önemlidir. +- Biletin geçerli bir sorunu çözdüğü ve Flink için **iyi bir uyum** olduğu konusunda **anlaşma** vardır. + Flink topluluğu aşağıdaki hususları göz önünde bulundurur: + - Katkı, özellikler veya bileşenlerin davranışını, önceki kullanıcıların programlarını ve kurulumlarını bozabilecek şekilde değiştiriyor mu? Eğer öyleyse, bu değişikliğin arzu edilir olduğu konusunda bir tartışma ve anlaşma olmalıdır. + - Katkı kavramsal olarak Flink'e iyi uyuyor mu? Soyutlamaları/API'leri daha karmaşık hale getirecek kadar özel bir durum mu? + - Özellik Flink'in mimarisine iyi uyuyor mu? Ölçeklenecek mi ve Flink'i gelecek için esnek tutacak mı, yoksa özellik Flink'i gelecekte kısıtlayacak mı? + - Özellik, (mevcut bir parçanın iyileştirilmesinden ziyade) önemli bir yeni eklenti mi? Eğer öyleyse, Flink topluluğu bu özelliği sürdürmeyi taahhüt edecek mi? + - Bu özellik, Flink'in yol haritası ve şu anda devam eden çabalarla iyi uyumlu mu? + - Özellik, Flink kullanıcıları veya geliştiricileri için katma değer üretiyor mu? Yoksa ilgili kullanıcı veya geliştirici faydası sağlamadan regresyon riski mi getiriyor? + - Katkı, örneğin Apache Bahir veya başka bir harici depoda yaşayabilir mi? + - Bu, sadece açık kaynaklı bir projede commit almak için yapılan bir katkı mı (yazım hatalarını düzeltme, sadece zevk için stil değişiklikleri yapma) +- Sorunun nasıl çözüleceği konusunda **fikir birliği** vardır. Bu, aşağıdaki hususları içerir: + - API ve veri geriye dönük uyumluluğu ve geçiş stratejileri + - Test stratejileri + - Flink'in derleme süresi üzerindeki etki + - Bağımlılıklar ve lisansları + +Eğer bir değişiklik Jira'daki tartışmada büyük veya tartışmalı bir değişiklik olarak tanımlanırsa, anlaşma ve fikir birliğine varmak için bir [Flink İyileştirme Önerisi (FLIP)](https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals) veya [Geliştirici e-posta listesinde]({{< relref "community" >}}#mailing-lists) bir tartışma gerektirebilir. + +Katkıda bulunanlar, bileti açtıktan sonra birkaç gün içinde bir committer'dan ilk tepkiyi alabilirler. Eğer bir bilet dikkat çekmezse, [geliştirici e-posta listesine]({{< relref "community" >}}#mailing-lists) ulaşmanızı öneririz. Flink topluluğunun bazen gelen tüm katkıları kabul etme kapasitesi olmadığını unutmayın. + + +Biletin tüm gereksinimleri karşılandığında, bir committer üzerinde çalışması için birini biletin *`Assignee`* alanına atayacaktır. +Yalnızca committer'ların birini atama izni vardır. + +**Atanmamış Jira biletlerine ait pull request'ler topluluk tarafından incelenmeyecek veya birleştirilmeyecektir**. + + + + +### 2. Değişikliğinizi uygulayın + +Bir Jira sorununa atandıktan sonra, gerekli değişiklikleri uygulamaya başlayabilirsiniz. + +Uygulama sırasında akılda tutulması gereken bazı diğer noktalar: + +- [Bir Flink geliştirme ortamı kurun](https://cwiki.apache.org/confluence/display/FLINK/Setting+up+a+Flink+development+environment) +- Flink'in [Kod Stili ve Kalite Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-preamble" >}})'nu takip edin +- Jira sorunu veya tasarım belgesindeki tüm tartışmaları ve gereksinimleri dikkate alın. +- İlgisiz sorunları tek bir katkıda karıştırmayın. + + + + +### 3. Bir Pull Request Açın + +Pull request açmadan önce dikkat edilmesi gerekenler: + +- Tüm kontrollerin geçtiğinden, kodun derlendiğinden ve tüm testlerin geçtiğinden emin olmak için **`mvn clean verify`** komutunun değişikliklerinizde başarıyla çalıştığından emin olun. +- [Flink'in Uçtan Uca testlerini](https://github.com/apache/flink/tree/master/flink-end-to-end-tests#running-tests) çalıştırın. +- İlgisiz veya gereksiz yeniden biçimlendirme değişikliklerinin dahil edilmediğinden emin olun. +- Commit geçmişinizin gereksinimlere uyduğundan emin olun. +- Değişikliğinizin taban dalınızdaki en son commit'lere yeniden yazıldığından emin olun. +- Pull request'in ilgili Jira'ya atıfta bulunduğundan ve her Jira sorununun tam olarak bir pull request'e atandığından emin olun (bir Jira için birden fazla pull request varsa, önce bu durumu çözün) + +Pull request açmadan önce veya açtıktan hemen sonra dikkate alınması gerekenler: + +- Dalın [Azure DevOps](https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=2) üzerinde başarıyla oluşturulduğundan emin olun. + +Flink'teki kod değişiklikleri, [GitHub pull request'leri](https://help.github.com/en/articles/creating-a-pull-request) aracılığıyla incelenir ve kabul edilir. + +[Bir pull request'i nasıl inceleyeceğiniz]({{< relref "how-to-contribute/reviewing-prs" >}}) konusunda, pull request inceleme sürecimizi de içeren ayrı bir kılavuz bulunmaktadır. Kod yazarı olarak, pull request'inizi tüm gereksinimleri karşılayacak şekilde hazırlamalısınız. + + + +### 4. Değişikliği birleştirin + +İnceleme tamamlandıktan sonra kod, Flink'in bir committer'ı tarafından birleştirilecektir. Ardından Jira bileti kapatılacaktır. + diff --git a/docs/content.tr/how-to-contribute/contribute-documentation.md b/docs/content.tr/how-to-contribute/contribute-documentation.md new file mode 100644 index 0000000000..efabac607b --- /dev/null +++ b/docs/content.tr/how-to-contribute/contribute-documentation.md @@ -0,0 +1,69 @@ +--- +title: Dokümantasyona Katkıda Bulunma +bookCollapseSection: false +weight: 20 +--- + +# Dokümantasyona Katkıda Bulunma + +İyi bir dokümantasyon, her türlü yazılım için çok önemlidir. Bu, özellikle Apache Flink gibi dağıtık veri işleme motorları olan karmaşık yazılım sistemleri için geçerlidir. Apache Flink topluluğu, özlü, kesin ve eksiksiz dokümantasyon sağlamayı amaçlar ve Apache Flink'in dokümantasyonunu iyileştirmeye yönelik her türlü katkıyı memnuniyetle karşılar. + +## Dokümantasyon kaynaklarını edinme + +Apache Flink'in dokümantasyonu, kod tabanı ile aynı [git](http://git-scm.com/) deposunda tutulur. Bu, kod ve dokümantasyonun kolayca senkronize tutulabilmesini sağlamak için yapılır. + +Dokümantasyona katkıda bulunmanın en kolay yolu, [GitHub'daki Flink'in yansıtılmış deposunu](https://github.com/apache/flink) sağ üst köşedeki fork düğmesine tıklayarak kendi GitHub hesabınıza fork etmektir. GitHub hesabınız yoksa, ücretsiz olarak bir tane oluşturabilirsiniz. + +Ardından, fork'unuzu yerel makinenize klonlayın. + +``` +git clone https://github.com//flink.git +``` + +Dokümantasyon, Flink kod tabanının `docs/` alt dizininde bulunur. + +## Dokümantasyon üzerinde çalışmaya başlamadan önce... + +...lütfen katkınıza karşılık gelen bir [Jira](https://issues.apache.org/jira/browse/FLINK) sorunu olduğundan emin olun. Yazım hataları gibi önemsiz düzeltmeler dışında, tüm dokümantasyon değişikliklerinin bir Jira sorununa atıfta bulunmasını gerektiriyoruz. + +Ayrıca, erişilebilir, tutarlı ve kapsayıcı dokümantasyon yazma konusunda bazı rehberlik için [Dokümantasyon Stil Kılavuzu]({{< relref "how-to-contribute/documentation-style-guide" >}}) sayfasına göz atın. + +## Dokümantasyonu güncelleme veya genişletme + +Flink dokümantasyonu [Markdown](http://daringfireball.net/projects/markdown/) ile yazılmıştır. Markdown, HTML'ye çevrilebilen hafif bir işaretleme dilidir. + +Dokümantasyonu güncellemek veya genişletmek için Markdown (`.md`) dosyalarını değiştirmeniz gerekir. Lütfen değişikliklerinizi, yapım komut dosyasını önizleme modunda başlatarak doğrulayın. + +``` +./build_docs.sh -p +``` + +Bu komut dosyası, Markdown dosyalarını statik HTML sayfalarına derler ve yerel bir web sunucusu başlatır. Derlenen dokümantasyonu değişikliklerinizle birlikte görüntülemek için tarayıcınızı `http://localhost:1313/` adresinde açın. Markdown dosyalarını değiştirip kaydettiğinizde ve tarayıcınızı yenilediğinizde, sunulan dokümantasyon otomatik olarak yeniden derlenir ve güncellenir. + +Lütfen geliştirici e-posta listesinde her türlü sorunuzu sormaktan çekinmeyin. + +## Çince dokümantasyon çevirisi + +Flink topluluğu hem İngilizce hem de Çince dokümantasyonu sürdürmektedir. Dokümantasyonu güncellemek veya genişletmek istiyorsanız, hem İngilizce hem de Çince dokümantasyon güncellenmelidir. Çince diline aşina değilseniz, lütfen mevcut JIRA sorunuyla bağlantılı olarak Çince dokümantasyon çevirisi için `chinese-translation` bileşeni ile etiketlenmiş bir JIRA açın. Çince diline aşina iseniz, her iki tarafı da bir pull request'te güncellemeniz teşvik edilir. + +*NOT: Flink topluluğu hala Çince dokümantasyonları çevirme sürecindedir, bazı belgeler henüz çevrilmemiş olabilir. Güncellediğiniz belge henüz çevrilmemişse, İngilizce değişiklikleri Çince belgeye kopyalayabilirsiniz.* + +Çince belgeler `content.zh/docs` klasöründe bulunmaktadır. İngilizce belge değişikliklerine göre `content.zh/docs` klasöründeki Çince dosyayı güncelleyebilir veya genişletebilirsiniz. + +## Katkınızı gönderme + +Flink projesi, dokümantasyon katkılarını [GitHub Mirror](https://github.com/apache/flink) üzerinden [Pull Request'ler](https://help.github.com/articles/using-pull-requests) olarak kabul eder. Pull request'ler, değişiklikleri içeren bir kod dalına işaret ederek yama sunmanın basit bir yoludur. + +Bir pull request hazırlamak ve göndermek için şu adımları izleyin. + +1. Değişikliklerinizi yerel git deponuza commit edin. Commit mesajı, ilgili Jira sorununa `[FLINK-XXXX]` ile başlayarak işaret etmelidir. + +2. Commit edilen katkınızı GitHub'daki Flink depo fork'unuza push edin. + + ``` + git push origin myBranch + ``` + +3. Depo fork'unuzun web sitesine gidin (`https://github.com//flink`) ve pull request oluşturmaya başlamak için "Create Pull Request" düğmesini kullanın. Temel fork'un `apache/flink master` olduğundan ve head fork'un değişikliklerinizi içeren dalı seçtiğinden emin olun. Pull request'e anlamlı bir açıklama verin ve gönderin. + +Bir yamayı bir [Jira]({{< param FlinkIssuesUrl >}}) sorununa eklemek de mümkündür. diff --git a/docs/content.tr/how-to-contribute/documentation-style-guide.md b/docs/content.tr/how-to-contribute/documentation-style-guide.md new file mode 100644 index 0000000000..0dc15f24dd --- /dev/null +++ b/docs/content.tr/how-to-contribute/documentation-style-guide.md @@ -0,0 +1,366 @@ +--- +title: Dokümantasyon Stil Kılavuzu +bookCollapseSection: false +weight: 21 +--- + +# Dokümantasyon Stil Kılavuzu + +Bu kılavuz, Flink dokümantasyonunu yazma ve katkıda bulunma konusunda temel stil kurallarına genel bir bakış sağlar. Mevcut dokümantasyonu iyileştirme ve genişletme konusundaki topluluk çabasına katkı yolculuğunuzu desteklemek ve dokümantasyonu daha **erişilebilir**, **tutarlı** ve **kapsayıcı** hale getirmeye yardımcı olmak amacıyla hazırlanmıştır. + +## Dil + +Flink dokümantasyonu **ABD İngilizcesi** ve **Çince** dillerinde sürdürülür — dokümantasyonu genişletirken veya güncellerken, her iki sürüm de tek bir pull request ile ele alınmalıdır. Çince diline aşina değilseniz, katkınızın şu ek adımlarla tamamlandığından emin olun: + +* [JIRA]({{< relref "community" >}}#issue-tracker) üzerinde chinese-translation bileşeni ile etiketlenmiş bir çeviri bileti açın; +* Bileti orijinal katkı JIRA biletine bağlayın. + +Mevcut dokümantasyonu Çince'ye çevirmeye katkıda bulunmak için stil kılavuzları mı arıyorsunuz? [Bu çeviri şartnamesine](https://cwiki.apache.org/confluence/display/FLINK/Flink+Translation+Specifications) danışabilirsiniz. + +## Dil Stili + +Aşağıda, yazılarınızda okunabilirliği ve erişilebilirliği sağlamaya yardımcı olabilecek bazı temel kurallar bulabilirsiniz. Dil stili hakkında daha derin ve eksiksiz bir inceleme için [Genel Yönlendirici İlkeler](#genel-yönlendirici-i̇lkeler) bölümüne de bakın. + +### Ses ve Ton + +* **Etken çatı kullanın.** [Etken çatı](https://medium.com/@DaphneWatson/technical-writing-active-vs-passive-voice-485dfaa4e498), kısalığı destekler ve içeriği daha çekici hale getirir. Bir cümledeki fiilin ardından _zombiler tarafından_ ifadesini eklerseniz ve hala mantıklı oluyorsa, edilgen çatı kullanıyorsunuz demektir. + + * **Etken Çatı** + "Bu örneği IDE'nizde veya komut satırında çalıştırabilirsiniz." + * **Edilgen Çatı** + "Bu örnek, IDE'nizde veya komut satırında çalıştırılabilir (zombiler tarafından)." + +* **Biz değil, siz kullanın.** _Biz_ kullanmak bazı kullanıcılar için kafa karıştırıcı ve küçümseyici olabilir, "hepimiz gizli bir kulübün üyesiyiz ve _sen_ bir üyelik daveti almadın" izlenimi verebilir. Kullanıcıya _siz_ olarak hitap edin. + +* **Cinsiyet ve kültüre özgü dilden kaçının.** Dokümantasyonda cinsiyet belirtmeye gerek yoktur: teknik yazı [cinsiyet açısından nötr](https://techwhirl.com/gender-neutral-technical-writing/) olmalıdır. Ayrıca, kendi dilinizde veya kültürünüzde doğal kabul ettiğiniz jargon ve gelenekler, başka yerlerde genellikle farklıdır. Mizah bunun tipik bir örneğidir: bir kültürde harika bir şaka, başka bir kültürde yanlış anlaşılabilir. + +* **Eylemleri nitelendirmekten ve önyargılı değerlendirmelerden kaçının.** Bir eylemi tamamlamakta zorlanan veya hayal kırıklığına uğrayan bir kullanıcı için _hızlı_ veya _kolay_ gibi kelimeler kullanmak, kötü bir dokümantasyon deneyimine yol açabilir. + +* **İfadeleri vurgulamak için BÜYÜK HARFLER kullanmaktan kaçının.** Anahtar kelimeleri **kalın** veya _italik_ yazı tipini kullanarak vurgulamak genellikle daha nazik görünür. Önemli ancak açık olmayan ifadelere dikkat çekmek istiyorsanız, bunları uygun bir HTML etiketi ile vurgulanan bir etiketle başlayan ayrı paragraflara gruplamayı deneyin: + * `Not` + * `Uyarı` + * `Tehlike` + +### Flink'e Özgü Terimleri Kullanma + +Terimlerin net tanımlarını kullanın veya bir şeyin ne anlama geldiği konusunda, diğer dokümantasyon sayfaları veya {{< docs_link file="flink-docs-stable/docs/concepts/glossary" name="Flink Sözlüğü">}} gibi yararlı kaynaklara bağlantı ekleyerek ek talimatlar sağlayın. Sözlük hala geliştirilme aşamasındadır, bu nedenle bir pull-request açarak yeni terimler de önerebilirsiniz. + +## Depo + +Markdown dosyaları (.md), kapsanan konuyu özetleyen, **küçük harfle** yazılmış ve kelimeler arasında **tire (-)** ile ayrılmış kısa bir isme sahip olmalıdır. Çince sürüm dosyı, İngilizce sürümle aynı isme sahip olmalı, ancak **content.zh** klasöründe saklanmalıdır. + +## Sözdizimi + +Dokümantasyon web sitesi [Hugo](https://gohugo.io/) kullanılarak oluşturulur ve sayfalar, web yayıncılığı için hafif taşınabilir bir format olan (ancak bununla sınırlı olmayan) [Markdown](https://daringfireball.net/projects/markdown/syntax) ile yazılır. + +### Genişletilmiş Sözdizimi + +Markdown ayrıca [GitHub Flavored Markdown](https://guides.github.com/features/mastering-markdown/) ve düz [HTML](http://www.simplehtmlguide.com/cheatsheet.php) ile birlikte de kullanılabilir. Örneğin, bazı katkıda bulunanlar görüntüler için HTML etiketleri kullanmayı tercih eder ve bu karışımı serbestçe kullanabilirler. + +### Ön Kısım (Front Matter) + +Markdown'a ek olarak, her dosya, sayfada değişkenleri ve meta verileri ayarlamak için kullanılacak bir YAML [ön kısım bloğu](https://jekyllrb.com/docs/front-matter/) içerir. Ön kısım, dosyadaki ilk şey olmalı ve üçlü çizgili satırlar arasında geçerli bir YAML kümesi olarak belirtilmelidir. + +### Apache Lisansı + +Her dokümantasyon dosyası için, ön kısımdan hemen sonra Apache Lisansı ifadesi gelmelidir. Her iki dil sürümü için de bu blok ABD İngilizcesinde belirtilmeli ve aşağıdaki örnekteki ile tam olarak aynı kelimelerle kopyalanmalıdır. + + +``` +--- +title: Concepts +layout: redirect +--- + +``` + +Aşağıda, Flink dokümantasyonunda en yaygın kullanılan ön kısım değişkenleri bulunmaktadır. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Değişken
Olası Değerler
Açıklama
Layoutlayout{base,plain,redirect}Kullanılacak düzen dosyası. Düzen dosyaları _layouts dizini altında bulunur.
İçeriktitle%sSayfa için en üst düzey (Seviye-1) başlık olarak kullanılacak başlık.
Navigasyonnav-id%sSayfanın ID'si. Diğer sayfalar bu ID'yi nav-parent_id olarak kullanabilir.
nav-parent_id{root,%s}Üst sayfanın ID'si. En düşük navigasyon seviyesi root'tur.
nav-pos%dNavigasyon seviyesi başına sayfaların göreceli konumu.
nav-title%sVarsayılan bağlantı metnini (başlık) geçersiz kılmak için kullanılacak başlık.
+
+ +`_config.yml` altında bulunan dokümantasyon genelindeki bilgiler ve yapılandırma ayarları da site değişkeni aracılığıyla ön kısma sunulur. Bu ayarlara aşağıdaki sözdizimi kullanılarak erişilebilir: + +```liquid +{{ "{{ site.CONFIG_KEY " }}}} +``` +Yer tutucu, dokümantasyon oluşturulurken `CONFIG_KEY` adlı değişkenin değeriyle değiştirilecektir. + +## Biçimlendirme + +Aşağıdaki bölümlerde, tutarlı ve gezinmesi kolay dokümantasyon yazma konusunda sizi başlatacak temel biçimlendirme kuralları listelenmiştir. + +### Başlıklar + +Markdown'da başlıklar, başında diyez işareti (#) olan herhangi bir satırdır; diyez sayısı başlık düzeyini gösterir. Başlıklar iç içe ve ardışık olmalıdır — stil nedeniyle asla bir başlık düzeyini atlamayın! + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Sözdizimi
Seviye
Açıklama
# Başlık
Seviye-1
Sayfa başlığı Ön Kısımda tanımlanır, bu nedenle bu seviye kullanılmamalıdır.
## Başlık
Seviye-2
Bölümler için başlangıç seviyesi. İçeriği daha yüksek düzeydeki konulara veya hedeflere göre düzenlemek için kullanılır.
### Başlık
Seviye-3
Alt bölümler. Destekleyici bilgileri veya görevleri ayırmak için her Bölümde kullanılır.
#### Başlık
Seviye-4
+
+ +#### En İyi Uygulama + +Başlıkların ifadesinde açıklayıcı bir dil kullanın. Örneğin, dinamik tablolar hakkında bir dokümantasyon sayfası için, "Arka Plan" veya "Teknik Bilgi" yerine "Dinamik Tablolar ve Sürekli Sorgular" daha açıklayıcıdır. + +### İçindekiler + +Dokümantasyon oluşturulurken, **İçindekiler** (TOC) otomatik olarak aşağıdaki işaretleme satırı kullanılarak sayfanın başlıklarından oluşturulur: + +```liquid +{{ "{:toc" }}} +``` + +**Seviye-3**'e kadar tüm başlıklar dikkate alınır. Belirli bir başlığı TOC'den hariç tutmak için: + +```liquid +{{ "# Hariç Tutulan Başlık +{:.no_toc" }}} +``` + +#### En İyi Uygulama + +Ele alınan konuya kısa ve öz bir giriş yazın ve bunu TOC'den önce yerleştirin. Temel mesajların bir taslağı gibi küçük bir bağlam, dokümantasyonun tutarlı olmasını ve her bilgi seviyesindeki kişiler tarafından anlaşılabilir olmasını sağlamada uzun bir yol kat eder. + + +### Navigasyon + +Dokümantasyon oluşturulurken, navigasyon her sayfanın [ön kısım değişkenleri](#ön-kısım-front-matter) içinde yapılandırılan özellikler kullanılarak tanımlanır. + +Kapsamlı dokümantasyon sayfalarında, kullanıcıların manuel olarak yukarı kaydırmadan sayfanın başına gidebilmelerini sağlayan _Başa Dön_ bağlantıları kullanmak mümkündür. İşaretlemede bu, dokümantasyon oluşturulduğunda varsayılan bir bağlantı ile değiştirilen bir yer tutucu olarak uygulanır: + +```liquid +{{ "{% top " }}%} +``` + +#### En İyi Uygulama + +Başa Dön bağlantılarını en azından her Seviye-2 bölümünün sonunda kullanmanız önerilir. + +### Açıklamalar + +Dokümantasyona uç durumları, sıkı ilişkili bilgileri veya bilinmesi güzel bilgileri dahil etmek istediğinizde, bunları özel açıklamalar kullanarak vurgulamak çok iyi bir uygulamadır. + +* Yararlı olabilecek bir ipucu veya bilgi parçasını vurgulamak için: + + ```html +
// Bilgi Mesajı
+ ``` + +* Tuzakların tehlikesini bildirmek veya takip edilmesi kritik öneme sahip önemli bir bilgi parçasına dikkat çekmek için: + + ```html +
// Tehlike Mesajı
+ ``` + +### Bağlantılar + +Dokümantasyona bağlantılar eklemek, kullanıcıyı üzerine yazma riski olmadan konuyu daha iyi anlamasına yönlendirmenin etkili bir yoludur. + +* **Sayfadaki bölümlere bağlantılar.** Her başlık, bir sayfa içinde doğrudan bağlantı vermek için örtük bir tanımlayıcı oluşturur. Bu tanımlayıcı, başlığı küçük harfe çevirerek ve iç boşlukları kısa çizgilerle değiştirerek oluşturulur. + + * **Başlık:** ## Başlık Adı + * **ID:** #başlık-adı +

+ + ```liquid + [Bağlantı Metni](#başlık-adı) + ``` + +* **Flink dokümantasyonunun diğer sayfalarına bağlantılar.** + + ```liquid + [Bağlantı Metni]({% link path/to/link-page.md %}) + ``` + +* **Harici sayfalara bağlantılar** + + ```liquid + [Bağlantı Metni](external_url) + ``` + +#### En İyi Uygulama + +Eylem veya hedef hakkında bilgi veren açıklayıcı bağlantılar kullanın. Örneğin, "Daha Fazla Bilgi" veya "Buraya Tıklayın" bağlantıları kullanmaktan kaçının. + +### Görsel Öğeler + +Şekiller ve diğer görsel öğeler kök _fig_ klasörü altına yerleştirilir ve dokümantasyon sayfalarında bağlantılara benzer bir sözdizimi kullanılarak referans gösterilebilir: + +```liquid +{{< img src="/fig/image_name.png" alt="Resim Metni" width="200px" >}} +``` + +#### En İyi Uygulama + +Akış şemaları, tablolar ve şekilleri uygun veya gerekli olduğunda ek açıklama için kullanın, ancak asla tek başına bilgi kaynağı olarak kullanmayın. Bu öğelere dahil edilen herhangi bir metnin okunabilecek kadar büyük olduğundan ve genel çözünürlüğün yeterli olduğundan emin olun. + +### Kod + +* **Satır içi kod.** Normal metin akışında küçük kod parçaları veya dil yapılarına referanslar çevreleyen ters tırnak işaretleriyle ( **\`** ) vurgulanmalıdır. + +* **Kod blokları.** Kendi başına yeterli örnekleri, özellik tanıtımlarını, en iyi uygulamaların gösterimini veya diğer yararlı senaryoları temsil eden kod, uygun [sözdizimi vurgulaması](https://github.com/rouge-ruby/rouge/wiki/List-of-supported-languages-and-lexers) ile çevrili bir kod bloğu kullanılarak sarılmalıdır. Bunu işaretleme ile elde etmenin bir yolu: + + ````liquid + ```java + // Java Kodu + ``` + ```` + +Birden fazla programlama dili belirtirken, her kod bloğu bir sekme olarak şekillendirilmelidir: + + ```html +
+ +
+ + ```java + // Java Kodu + ``` + +
+ +
+ + ```scala + // Scala Kodu + ``` + +
+ +
+ ``` + +Bu kod blokları genellikle öğrenmek ve keşfetmek için kullanılır, bu nedenle akılda tutulması gereken bazı en iyi uygulamalar vardır: + +* **Anahtar geliştirme görevlerini sergileyin.** Kod örneklerini, kullanıcılar için anlamlı olan yaygın uygulama senaryoları için saklayın. Daha uzun ve karmaşık örnekleri öğreticiler veya adım adım kılavuzlar için bırakın. + +* **Kodun bağımsız olduğundan emin olun.** Kod örnekleri kendi kendine yeterli olmalı ve harici bağımlılıklara sahip olmamalıdır (belirli konnektörlerin nasıl kullanılacağına ilişkin örnekler gibi aykırı durumlar hariç). Joker karakterler kullanmadan tüm import ifadelerini dahil edin, böylece yeni başlayanlar hangi paketlerin kullanıldığını anlayabilir ve öğrenebilir. + +* **Kısayollardan kaçının.** Örneğin, gerçek dünya kodunda yapacağınız gibi istisnaları ve temizleme işlemlerini ele alın. + +* **Yorumlar ekleyin, ancak abartmayın.** Kodun ana işlevselliğini ve okunmasından açık olmayabilecek olası tuzakları açıklayan bir giriş sağlayın. Uygulama ayrıntılarını açıklamak ve beklenen çıktıyı tanımlamak için yorumlar kullanın. + +* **Kod bloklarındaki komutlar.** Komutlar, `bash` sözdizimi vurgulanan kod blokları kullanılarak belgelenebilir. Dokümantasyona komut eklerken aşağıdaki hususlar dikkate alınmalıdır: + * **Uzun parametre adları kullanın.** Uzun parametre adları, okuyucunun komutun amacını anlamasına yardımcı olur. Bunlar, kısa muadillerine göre tercih edilmelidir. + * **Satır başına bir parametre.** Uzun parametre adları kullanmak, komutu okumayı muhtemelen zorlaştırır. Her satıra bir parametre koymak okunabilirliği artırır. Kopyala ve yapıştır işlemini desteklemek için her ara satırın sonunda satır sonunu belirten bir ters eğik çizgi `\` eklemeniz gerekir. + * **Girinti**. Each new parameter line should be indented by 6 spaces. + * **Komut başlangıcını belirtmek için `$` öneki kullanın**. Birden fazla komut olduğunda kod bloğunun okunabilirliği kötüleşebilir. Her yeni komutun önüne dolar işareti `$` koymak bir komutun başlangıcını belirlemeye yardımcı olur. + + Doğru biçimlendirilmiş bir komut şöyle görünür: + +```bash +$ ./bin/flink run-application \ +--target kubernetes-application \ +-Dkubernetes.cluster-id=my-first-application-cluster \ +-Dkubernetes.container.image=custom-image-name \ +local:///opt/flink/usrlib/my-flink-job.jar +``` + +## Genel Yönlendirici İlkeler + +Bu stil kılavuzu, **Erişilebilir**, **Tutarlı**, **Nesnel**, **Mantıklı** ve **Kapsayıcı** dokümantasyon için temel oluşturma gibi kapsayıcı bir amaca sahiptir. + +#### Erişilebilir + +Flink topluluğu çeşitli ve uluslararasıdır, bu nedenle dokümantasyon yazarken geniş ve küresel düşünmeniz gerekir. Herkes İngilizceyi anadil düzeyinde konuşmaz ve Flink (ve genel olarak stream processing) ile ilgili deneyim seviyesi mutlak yeni başlayanlardan deneyimli ileri düzey kullanıcılara kadar değişir. Ürettiğiniz içerikte teknik doğruluğu ve dilsel netliği sağlayın, böylece tüm kullanıcılar tarafından anlaşılabilsin. + +#### Tutarlı + +Bu stil kılavuzunda detaylandırılan temel kurallara bağlı kalın ve metni yazım, büyük harf kullanımı, kısa çizgi kullanımı, kalın ve italik yazma konusunda aynı şekilde biçimlendirmek için kendi en iyi yargınızı kullanın. Doğru dilbilgisi, noktalama ve yazım arzu edilir, ancak sert bir gereklilik değildir — dokümantasyon katkıları her düzeyde dil yeterliliğine açıktır. + +#### Nesnel + +Cümlelerinizi kısa ve öz tutun. Bir kural olarak, eğer bir cümle 14 kelimeden kısaysa, okuyucular muhtemelen içeriğinin yüzde 90'ını anlayacaktır. 25'ten fazla kelime içeren cümleler genellikle anlaşılması daha zordur ve mümkün olduğunda gözden geçirilmeli ve bölünmelidir. Kısa ve öz olmak ve iyi bilinen anahtar kelimeler kullanmak, kullanıcıların ilgili dokümantasyona az çabayla ulaşmalarını sağlar. + +#### Mantıklı + +Çoğu kullanıcının çevrimiçi içeriği tarayacağını ve sadece [yüzde 28'ini](https://www.nngroup.com/articles/website-reading/) okuyacağını unutmayın. Bu, ilgili fikirleri açık bir bilgi hiyerarşisinde bir araya getirmenin ve odaklanmış, açıklayıcı başlıklar kullanmanın önemini vurgular. Her bölümün ilk iki paragrafına en ilgili bilgileri yerleştirmek, kullanıcı için "harcanan zamanın geri dönüşünü" artıran iyi bir uygulamadır. + +#### Kapsayıcı + +İçeriğin tüm kullanıcılar tarafından bulunabilir ve onlara açık olmasını sağlamak için olumlu bir dil ve somut, ilişkilendirilebilir örnekler kullanın. Dokümantasyon diğer dillere çevrilir, bu nedenle basit bir dil ve tanıdık kelimeler kullanmak çeviri çabasını da azaltmaya yardımcı olur. \ No newline at end of file diff --git a/docs/content.tr/how-to-contribute/getting-help.md b/docs/content.tr/how-to-contribute/getting-help.md new file mode 100644 index 0000000000..c55d33ed94 --- /dev/null +++ b/docs/content.tr/how-to-contribute/getting-help.md @@ -0,0 +1,149 @@ +--- +title: Yardım Alma +bookCollapseSection: false +weight: 25 +aliases: +- /getting-help.html +- /getting-help/index.html + +--- + + +# Yardım Alma + +## Bir Sorunuz mu Var? + +Apache Flink topluluğu her gün birçok kullanıcı sorusunu yanıtlar. Arşivlerde yanıt ve tavsiye arayabilir veya yardım ve rehberlik için toplulukla iletişime geçebilirsiniz. + +### Kullanıcı E-posta Listesi + +Birçok Flink kullanıcısı, katkıda bulunan ve committer, Flink'in kullanıcı e-posta listesine abonedir. Kullanıcı e-posta listesi, yardım istemek için çok iyi bir yerdir. + +E-posta listesine göndermeden önce, aşağıdaki web sitelerinde sizin sorunlarınızla ilgili konuları tartışan e-posta dizilerini aramak için e-posta listesi arşivlerini arayabilirsiniz. + +- [Apache E-posta Listesi Arşivi](https://lists.apache.org/list.html?user@flink.apache.org) + +E-posta listesine göndermek istiyorsanız, şunları yapmanız gerekir: + +1. `user-subscribe@flink.apache.org` adresine bir e-posta göndererek e-posta listesine abone olun, +2. Onay e-postasını yanıtlayarak aboneliği onaylayın ve +3. E-postanızı `user@flink.apache.org` adresine gönderin. + +Abone değilseniz e-postanıza yanıt alamayacağınızı lütfen unutmayın. + +### Slack + +[Slack'teki Apache Flink topluluğuna katılabilirsiniz.]({{< param FlinkSlackInviteUrl >}}) +Slack'te bir hesap oluşturduktan sonra, #introductions kanalında kendinizi tanıtmayı unutmayın. +Slack'in sınırlamaları nedeniyle davet bağlantısı 100 davetten sonra süresi dolar. Süresi dolmuşsa, lütfen [Dev e-posta listesi]({{< relref "community" >}}#mailing-lists) ile iletişime geçin. +Herhangi bir mevcut Slack üyesi de başka herhangi birini katılmaya davet edebilir. + +Birkaç topluluk kuralı vardır: + +* **Saygılı olun** - Bu en önemli kuraldır! +* Tüm önemli kararlar ve sonuçlar **e-posta listelerine yansıtılmalıdır.** + "Eğer bir e-posta listesinde olmadıysa, olmamıştır." - [Apache Motto'ları](http://theapacheway.com/on-list/) +* Paralel konuşmaların bir kanalı bunaltmasını önlemek için **Slack dizilerini** kullanın. +* Ya [#pyflink](https://apache-flink.slack.com/archives/C03G7LJTS2G) (tüm Python Flink soruları için) ya da [#troubleshooting](https://apache-flink.slack.com/archives/C03G7LJTS2G) (diğer tüm Flink soruları için) kullanın. +* Sorun giderme, Jira atama ve PR incelemesi için lütfen insanlara **doğrudan mesaj göndermeyin**. Bunu yapmak Slack'ten çıkarılmanıza neden olabilir. + +### Stack Overflow + +Flink topluluğunun birçok üyesi [Stack Overflow](https://stackoverflow.com)'da aktiftir. [\[apache-flink\]](https://stackoverflow.com/questions/tagged/apache-flink) etiketini kullanarak sorular ve cevaplar arayabilir veya sorularınızı gönderebilirsiniz. + +## Bir Hata mı Buldunuz? + +Bir hata nedeniyle oluşabilecek beklenmeyen bir davranış gözlemlerseniz, bildirilmiş hataları arayabilir veya [Flink'in JIRA'sında]({{< relref "community#issue-tracker" >}}) bir hata raporu oluşturabilirsiniz. + +Beklenmeyen davranışın bir hata nedeniyle mi oluştuğundan emin değilseniz, lütfen [kullanıcı e-posta listesine]({{< relref "community" >}}#user-mailing-list) bir soru gönderin. + +## Bir Hata Mesajı mı Aldınız? + +Bir hata mesajının nedenini belirlemek zor olabilir. Aşağıda, en yaygın hata mesajlarını listeliyor ve bunları nasıl ele alacağınızı açıklıyoruz. + +### NotSerializableException hatası alıyorum. + +Flink, uygulama mantığının kopyalarını (uyguladığınız fonksiyonlar ve işlemler, program yapılandırması vb.) paralel çalışan işlemlere dağıtmak için Java serileştirmesini kullanır. Bu nedenle, API'ye aktardığınız tüm fonksiyonlar, [java.io.Serializable](http://docs.oracle.com/javase/8/docs/api/java/io/Serializable.html) tarafından tanımlandığı gibi serileştirilebilir olmalıdır. + +Fonksiyonunuz anonim bir iç sınıfsa, şunları düşünün: + +- Fonksiyonu bağımsız bir sınıf veya statik bir iç sınıf haline getirin. +- Java 8 lambda fonksiyonu kullanın. + +Fonksiyonunuz zaten statik bir sınıfsa, sınıfın bir örneğini oluşturduğunuzda atadığınız alanları kontrol edin. Alanlardan biri büyük olasılıkla serileştirilemeyen bir türü tutuyor. + +- Java'da, bir `RichFunction` kullanın ve sorunlu alanları `open()` metodunda başlatın. +- Scala'da, genellikle başlatmayı dağıtılmış yürütme gerçekleşene kadar ertelemek için "lazy val" kullanabilirsiniz. Bu, küçük bir performans maliyetine neden olabilir. Doğal olarak Scala'da da bir `RichFunction` kullanabilirsiniz. + +### Scala API'sini kullanırken, implicit değerler ve evidence parametreleri hakkında bir hata alıyorum. + +Bu hata, tür bilgisi için implicit değerin sağlanamadığı anlamına gelir. Kodunuzda bir `import org.apache.flink.streaming.api.scala._` (DataStream API) veya bir `import org.apache.flink.api.scala._` (DataSet API) ifadesinin olduğundan emin olun. + +Genel parametreler alan fonksiyonlar veya sınıflar içinde Flink işlemleri kullanıyorsanız, o parametre için bir TypeInformation mevcut olmalıdır. Bu, bir bağlam sınırı kullanılarak elde edilebilir: + +~~~scala +def myFunction[T: TypeInformation](input: DataSet[T]): DataSet[Seq[T]] = { + input.reduceGroup( i => i.toSeq ) +} +~~~ + +Flink'in türleri nasıl ele aldığına dair derinlemesine bir tartışma için [Tür Çıkarma ve Serileştirme]({{< param DocsBaseUrl >}}/dev/types_serialization.html) bölümüne bakın. + +### ClassCastException görüyorum: X, X'e dönüştürülemiyor. + +`com.foo.X`, `com.foo.X`'e dönüştürülemiyor (veya `com.foo.X`'e atanamıyor) tarzında bir istisna gördüğünüzde, bu `com.foo.X` sınıfının birden fazla versiyonunun farklı sınıf yükleyiciler tarafından yüklendiği ve bu sınıfın türlerinin birbirine atanmaya çalışıldığı anlamına gelir. + +Bunun nedeni şunlar olabilir: + +- `child-first` sınıf yükleme yoluyla sınıf çoğaltma. Bu, kullanıcıların Flink'in kullandığı aynı bağımlılıkların farklı sürümlerini kullanmasına izin vermek için tasarlanmış bir mekanizmadır. Ancak, bu sınıfların farklı kopyaları Flink'in çekirdeği ve kullanıcı uygulama kodu arasında hareket ederse, böyle bir istisna oluşabilir. Bunun nedenini doğrulamak için, yapılandırmada `classloader.resolve-order: parent-first` ayarını yapmayı deneyin. Eğer bu hata kaybolursa, bir hata olup olmadığını kontrol etmek için lütfen e-posta listesine yazın. + +- Guava'nın Interners'ı veya Avro'nun Schema önbelleği gibi yardımcı programlar tarafından farklı yürütme girişimlerinden sınıfların önbelleğe alınması. Interners kullanmamaya çalışın veya yeni bir görev yürütmesi başlatıldığında yeni bir önbelleğin oluşturulduğundan emin olmak için interners/önbelleğin kapsamını azaltın. + +### AbstractMethodError veya NoSuchFieldError hatası alıyorum. + +Bu tür hatalar genellikle bazı bağımlılık sürümlerinde bir karışıklığı gösterir. Bu, yürütme sırasında yüklenen bir bağımlılığın (bir kütüphanenin) sürümünün, kodun derlendiği sürümden farklı olduğu anlamına gelir. + +Flink 1.4.0'dan itibaren, uygulama JAR dosyanızdaki bağımlılıklar, Flink'in çekirdeği tarafından kullanılan bağımlılıklara veya sınıf yolundaki diğer bağımlılıklara (örneğin Hadoop'tan) göre farklı sürümlere sahip olabilir. Bu, varsayılan olan `child-first` sınıf yüklemenin etkinleştirilmesini gerektirir. + +Bu sorunları Flink 1.4+ sürümünde görüyorsanız, aşağıdakilerden biri doğru olabilir: + +- Uygulama kodunuzda bir bağımlılık sürüm çakışması var. Tüm bağımlılık sürümlerinizin tutarlı olduğundan emin olun. +- Flink'in `child-first` sınıf yükleme yoluyla destekleyemediği bir kütüphane ile çakışıyorsunuz. Şu anda bunlar, Scala standart kütüphane sınıfları, Flink'in kendi sınıfları, loglama API'leri ve herhangi bir Hadoop çekirdek sınıfıdır. + + +### DataStream uygulamam, olaylar içeri girmesine rağmen çıktı üretmiyor. + +DataStream uygulamanız *Event Time* kullanıyorsa, watermark'larınızın güncellendiğinden emin olun. Watermark üretilmezse, event time pencereleri hiçbir zaman tetiklenmeyebilir ve uygulama sonuç üretmeyebilir. + +Flink'in web arayüzünde (watermark bölümü) watermark'ların ilerleme kaydedip kaydetmediğini kontrol edebilirsiniz. + +### "Insufficient number of network buffers" bildiren bir istisna görüyorum. + +Flink'i çok yüksek bir paralellikle çalıştırırsanız, ağ tamponlarının sayısını artırmanız gerekebilir. + +Varsayılan olarak Flink, minimum 64MB ve maksimum 1GB ile JVM yığın boyutunun %10'unu ağ tamponları için alır. Tüm bu değerleri `taskmanager.network.memory.fraction`, `taskmanager.network.memory.min` ve `taskmanager.network.memory.max` aracılığıyla ayarlayabilirsiniz. + +Ayrıntılar için lütfen [Yapılandırma Referansı](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/)'na bakın. + +### İşim HDFS/Hadoop kodundan çeşitli istisnalarla başarısız oluyor. Ne yapabilirim? + +Bunun en yaygın nedeni, Flink'in sınıf yolundaki Hadoop sürümünün, bağlanmak istediğiniz kümenin Hadoop sürümünden farklı olmasıdır (HDFS / YARN). + +Bunu düzeltmenin en kolay yolu, Hadoop içermeyen bir Flink sürümü seçmek ve Hadoop yolunu ve sınıf yolunu kümeden basitçe dışa aktarmaktır. diff --git a/docs/content.tr/how-to-contribute/improve-website.md b/docs/content.tr/how-to-contribute/improve-website.md new file mode 100644 index 0000000000..9ff26b8a23 --- /dev/null +++ b/docs/content.tr/how-to-contribute/improve-website.md @@ -0,0 +1,106 @@ +--- +title: Web Sitesine Katkıda Bulunma +bookCollapseSection: false +weight: 22 +--- + +# Web Sitesini İyileştirme + +[Apache Flink web sitesi](http://flink.apache.org), Apache Flink'i ve topluluğunu tanıtır. Web sitesi aşağıdakiler dahil olmak üzere çeşitli amaçlara hizmet eder: + +- Ziyaretçileri Apache Flink ve özellikleri hakkında bilgilendirmek. +- Ziyaretçileri Flink'i indirmeye ve kullanmaya teşvik etmek. +- Ziyaretçileri toplulukla etkileşime geçmeye teşvik etmek. + +Web sitemizi iyileştirmek için yapılacak her türlü katkıyı memnuniyetle karşılıyoruz. Bu belge, Flink'in web sitesini iyileştirmek için gerekli tüm bilgileri içerir. + +## Web sitesi kaynaklarını edinme + +Apache Flink'in web sitesi, GitHub'da [https://github.com/apache/flink-web](https://github.com/apache/flink-web) adresinde yansıtılan özel bir [git](http://git-scm.com/) deposunda barındırılmaktadır. + +Web sitesi güncellemelerine katkıda bulunmanın en kolay yolu, [GitHub'daki yansıtılmış web sitesi deposunu](https://github.com/apache/flink-web) sağ üst köşedeki fork düğmesine tıklayarak kendi GitHub hesabınıza fork etmektir. GitHub hesabınız yoksa, ücretsiz olarak bir tane oluşturabilirsiniz. + +Ardından, fork'unuzu yerel makinenize klonlayın. + +``` +git clone https://github.com//flink-web.git +``` + +`flink-web` dizini, klonlanmış depoyu içerir. Web sitesi, deponun `asf-site` dalında bulunur. Dizine girmek ve `asf-site` dalına geçmek için aşağıdaki komutları çalıştırın. + +``` +cd flink-web +git checkout asf-site +``` + +## Dizin yapısı ve dosyalar + +Flink'in web sitesi [Markdown](http://daringfireball.net/projects/markdown/) ile yazılmıştır. Markdown, HTML'ye çevrilebilen hafif bir işaretleme dilidir. Markdown'dan statik HTML dosyaları oluşturmak için [Hugo](https://gohugo.io/) kullanıyoruz. + +Web sitesi git deposundaki dosya ve dizinler aşağıdaki rollere sahiptir: + +- `.md` ile biten tüm dosyalar Markdown dosyalarıdır. Bu dosyalar statik HTML dosyalarına dönüştürülür. +- `docs` dizini, web sitesini oluşturmak ve/veya oluşturmak için gereken tüm dokümantasyonu, temaları ve diğer içeriği içerir. +- `docs/content/docs` klasörü tüm İngilizce içeriği içerir. `docs/content.zh/docs` tüm Çince içeriği içerir. +- `docs/content/posts` tüm blog yazılarını içerir. +- `content/` dizini Hugo'dan oluşturulan HTML dosyalarını içerir. Flink web sitesini barındıran Apache Altyapısı HTML içeriğini bu dizinden çektiği için dosyaları bu dizine yerleştirmek önemlidir. (Committer'lar için: Web sitesi git'ine değişiklikleri gönderirken, `content/` dizinindeki güncellemeleri de gönderin!) + +## Dokümantasyonu güncelleme veya genişletme + +Web sitesini Markdown dosyalarını veya CSS dosyaları gibi diğer kaynakları değiştirerek veya ekleyerek güncelleyebilir ve genişletebilirsiniz. Değişikliklerinizi doğrulamak için oluşturma komut dosyasını önizleme modunda başlatın. + +``` +./build.sh +``` + +Komut dosyası, Markdown dosyalarını HTML'ye derler ve yerel bir web sunucusu başlatır. Değişikliklerinizi içeren web sitesini görüntülemek için tarayıcınızı `http://localhost:1313` adresinde açın. Çince çeviri `http://localhost:1313/zh/` adresinde bulunur. Sunulan web sitesi, herhangi bir dosyayı değiştirip kaydettiğinizde ve tarayıcınızı yenilediğinizde otomatik olarak yeniden derlenir ve güncellenir. + +Dokümantasyonlarınızda veya blog yazılarınızda Flink'in resmi dokümantasyonuna harici bir bağlantı eklemek için, lütfen aşağıdaki sözdizimini kullanın: + +```markdown +{{}} +``` + +Örneğin: + +```markdown +{{}} +``` + +Lütfen geliştirici e-posta listesinde her türlü sorunuzu sormaktan çekinmeyin. + +## Katkınızı gönderme + +Flink projesi, web sitesi katkılarını [GitHub Mirror](https://github.com/apache/flink-web) üzerinden [Pull Request'ler](https://help.github.com/articles/using-pull-requests) olarak kabul eder. Pull request'ler, değişiklikleri içeren bir kod dalına işaret ederek yama sunmanın basit bir yoludur. + +Bir pull request hazırlamak ve göndermek için şu adımları izleyin. + +1. Değişikliklerinizi yerel git deponuza commit edin. Katkınız web sitesinin büyük bir yeniden düzenlemesi olmadığı sürece, lütfen bunu tek bir commit olarak sıkıştırın. + +2. Commit'i, Flink deposunun GitHub'daki fork'unuzun özel bir dalına push edin. + + ``` + git push origin myBranch + ``` + +3. Depo fork'unuzun web sitesine gidin (`https://github.com//flink-web`) ve bir pull request oluşturmaya başlamak için "Create Pull Request" düğmesini kullanın. Temel fork'un `apache/flink-web asf-site` olduğundan ve head fork'un değişikliklerinizi içeren dalı seçtiğinden emin olun. Pull request'e anlamlı bir açıklama verin ve gönderin. + +## Committer bölümü + +**Bu bölüm yalnızca committer'lar için geçerlidir.** + +### ASF web sitesi git depoları + +**ASF yazılabilir**: https://gitbox.apache.org/repos/asf/flink-web.git + +ASF git deposu için kimlik bilgilerinin nasıl ayarlanacağına ilişkin ayrıntılar [burada bağlantılıdır](https://gitbox.apache.org/). + +### Bir pull request'i birleştirme + +Katkıların yalnızca kaynak dosyalar üzerinde yapılması beklenir (`content/` dizinindeki derlenmiş dosyalarda değişiklik yapılmaz). Bir web sitesi değişikliğini push etmeden önce, lütfen oluşturma komut dosyasını çalıştırın + +``` +./build.sh +``` + +değişiklikleri `content/` dizinine ek bir commit olarak ekleyin ve değişiklikleri ASF temel deposuna push edin. diff --git a/docs/content.tr/how-to-contribute/overview.md b/docs/content.tr/how-to-contribute/overview.md new file mode 100644 index 0000000000..d61338addd --- /dev/null +++ b/docs/content.tr/how-to-contribute/overview.md @@ -0,0 +1,128 @@ +--- +title: Genel Bakış +bookCollapseSection: false +weight: 16 +--- + +# Nasıl Katkıda Bulunulur + +Apache Flink, açık ve dostane bir topluluk tarafından geliştirilmektedir. Herkes topluluğa katılmaya ve Apache Flink'e katkıda bulunmaya içtenlikle davetlidir. Toplulukla etkileşime geçmenin ve Flink'e katkıda bulunmanın soru sormak, hata raporları göndermek, yeni özellikler önermek, e-posta listelerindeki tartışmalara katılmak, kod veya dokümantasyon katkısında bulunmak, web sitesini geliştirmek veya sürüm adaylarını test etmek gibi çeşitli yolları vardır. + + +## Ne yapmak istiyorsunuz? +

Apache Flink'e katkıda bulunmak, proje için kod yazmaktan daha fazlasını içerir. Aşağıda, projeye yardım etmek için farklı fırsatları listeliyoruz:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
AlanDaha fazla bilgi
Hata BildirFlink ile ilgili bir sorunu bildirmek için, Flink'in Jira'sını açın, gerekirse giriş yapın ve üstteki kırmızı Oluştur düğmesine tıklayın.
+ Lütfen karşılaştığınız sorun hakkında detaylı bilgi verin ve mümkünse sorunu yeniden oluşturmaya yardımcı olacak bir açıklama ekleyin.
Kod Katkısında Bulun}}">Kod Katkı Kılavuzu'nu okuyun
Kod İncelemelerine Yardım Et}}">Kod İnceleme Kılavuzu'nu okuyun
Bir Sürüm Hazırlamaya Yardım Et + Yeni bir sürüm yayınlama şu adımlardan oluşur: +
    +
  1. Yeni bir sürüm adayı oluşturmak ve dev@flink.apache.org listesinde bir oylama başlatmak (genellikle 72 saat sürer)
  2. +
  3. Sürüm adayını test etmek ve oylamak (herhangi bir sorun bulunmadıysa +1, sürüm adayında sorunlar varsa -1).
  4. +
  5. Sürüm adayında sorunlar varsa 1. adıma geri dönmek. Aksi takdirde sürümü yayınlarız.
  6. +
+ Bir sürüm için test prosedürünü okuyun. +
Dokümantasyona Katkıda Bulun}}">Dokümantasyon Katkı Kılavuzu'nu okuyun
Flink Kullanıcılarına Destek Ol + +
Web Sitesini Geliştir}}">Web Sitesi Katkı Kılavuzu'nu okuyun
Flink Hakkında Bilgi Yay + +
+ Başka bir sorunuz mu var? Yardım almak için }}#mailing-lists">dev@flink.apache.org e-posta listesiyle iletişime geçin! +
+ + + +## Daha fazla okuma + + +#### Committer olmak için + +Committer'lar, projenin depolarına yazma erişimi olan topluluk üyeleridir, yani kodu, dokümantasyonu ve web sitesini kendileri değiştirebilir ve diğer katkıları da kabul edebilirler. + +Committer veya PMC üyesi olmak için katı bir protokol yoktur. Yeni committer'lar için adaylar, genellikle aktif katkıda bulunanlar ve topluluk üyeleridir. + +Yeni committer'lar için adaylar, mevcut committer'lar veya PMC üyeleri tarafından önerilir ve PMC tarafından oylanır. + +Eğer committer olmak istiyorsanız, toplulukla etkileşime geçmeli ve yukarıdaki yollardan herhangi biriyle Apache Flink'e katkıda bulunmaya başlamalısınız. Ayrıca diğer committer'larla konuşabilir, tavsiye ve rehberliklerini isteyebilirsiniz. + +#### Committer'larda neleri arıyoruz + +Committer olmak, projeye (topluluk veya teknoloji) önemli bir katkıda bulunan olarak tanınmak ve gelişime yardımcı olacak araçlara sahip olmak anlamına gelir. Committer adayları, uzun bir süre boyunca iyi katkılarda bulunmuş ve katkılarına devam etmek isteyen topluluk üyeleridir. + +Topluluk katkıları, e-posta listesindeki kullanıcı sorularını yanıtlamaya yardımcı olmak, sürüm adaylarını doğrulamak, konuşmalar yapmak, topluluk etkinlikleri düzenlemek ve diğer tanıtım ve topluluk oluşturma biçimlerini içerir. "Apache Yolu", proje topluluğuna güçlü bir şekilde odaklanır ve committer'lar, herhangi bir kod katkısı olmadan bile üstün topluluk katkıları için tanınabilirler. + +Kod/teknoloji katkıları, katkıda bulunulan pull request'leri (patch'ler), tasarım tartışmaları, incelemeler, testler ve hataları tanımlama ve düzeltmede diğer yardımları içerir. Özellikle yapıcı ve yüksek kaliteli tasarım tartışmaları ve diğer katkıda bulunanlara yardım etmek, güçlü göstergelerdir. + +Önceki noktalar umut verici adayları tanımlamanın yollarını verirken, aşağıdakiler herhangi bir committer adayı için "olmazsa olmaz"lardır: + +- Topluluk odaklı olmak: Aday, topluluk yönetiminin liyakat ilkelerini anlar. Her zaman mümkün olduğunca kişisel katkıyı optimize etmezler, ancak mantıklı olduğunda başkalarına yardım eder ve onları güçlendirirler. + +- Bir committer adayının depolara yazma erişimini sorumlu bir şekilde kullanacağına ve şüphe durumunda, muhafazakâr davranacağına güveniriz. Flink büyük bir sistemdir ve committer'ların neyi bildiklerinin ve neyi bilmediklerinin farkında olmaları önemlidir. Şüphe durumunda, committer'lar iyi tanımadıkları kısımlara commit etmek yerine ikinci bir göz istemelidir. (En deneyimli committer'lar bile bu uygulamayı takip ederler.) + +- Diğer topluluk üyelerine saygılı davrandıklarını ve tartışmalarda yapıcı olduklarını göstermişlerdir. + + +#### PMC üyelerinde neleri arıyoruz + +PMC, projenin resmi kontrol organıdır. PMC üyeleri PMC'nin resmi sorumluluklarını (sürümleri ve committer'ların/PMC'nin büyümesini doğrulamak) yerine getirebilmelidir. Onların Flink, teknoloji ve topluluk açısından bir vizyona sahip kişiler olmasını "isteriz". + +Şüpheye yer bırakmamak için, her PMC üyesinin Flink'in sürüm sürecinin tam olarak nasıl çalıştığının tüm ayrıntılarını bilmesi gerekmez (özünü ve ayrıntıları nasıl bulacağını anlamak yeterlidir). Aynı şekilde, her PMC üyesinin vizyoner olması gerekmez. Her üyenin farklı güçlü yönler getirdiğini anlayarak, tüm kısımları iyi kapsayan bir PMC oluşturmaya çalışırız. + +İdeal olarak, Flink'in yönünü (teknoloji ve topluluk) şekillendirme girişiminde bulunmuş ve sürüm oluşturma veya doğrulama gibi resmi süreçleri öğrenme istekliliği göstermiş aktif topluluk üyeleri arasında adaylar buluruz. + +Bir PMC üyesi aynı zamanda bir committer'dır. Adaylar ya zaten committer'dır ya da PMC'ye katıldıklarında otomatik olarak committer olurlar. Bu nedenle, "Committer'larda neleri arıyoruz?" bölümü PMC adayları için de geçerlidir. + +Bir PMC üyesinin bir projede büyük bir gücü vardır. Tek bir PMC üyesi, birçok kararı engelleyebilir ve genel olarak projeyi birçok şekilde durdurabilir ve zarar verebilir. Bu nedenle, PMC adaylarının soğukkanlı, yapıcı, destekleyici olduklarına ve zaman zaman "katılmayıp kabul etmeye" istekli olduklarına güvenmeliyiz. + diff --git a/docs/content.tr/how-to-contribute/reviewing-prs.md b/docs/content.tr/how-to-contribute/reviewing-prs.md new file mode 100644 index 0000000000..3fd9c3ed4e --- /dev/null +++ b/docs/content.tr/how-to-contribute/reviewing-prs.md @@ -0,0 +1,89 @@ +--- +title: Pull Request'leri İnceleme +bookCollapseSection: false +weight: 18 +--- + +# Bir Pull Request Nasıl İncelenir + +Bu kılavuz, kod katkılarını incelemek isteyen tüm committer'lar ve katkıda bulunanlar içindir. Çabanız için teşekkür ederiz - iyi incelemeler, bir açık kaynak projesinin en önemli ve kritik parçalarından biridir. Bu kılavuz, topluluğun aşağıdaki şekilde incelemeler yapmasına yardımcı olmayı amaçlamaktadır: + +* Katkıda bulunanlar iyi bir katkı deneyimi yaşarlar. +* İncelemelerimiz yapılandırılmıştır ve bir katkının tüm önemli yönlerini kontrol eder. +* Flink'te yüksek kod kalitesini korumayı sağlarız. +* Katkıda bulunanların ve inceleyicilerin, daha sonra reddedilen bir katkıyı geliştirmek için çok zaman harcadığı durumlardan kaçınırız. + +## İnceleme Kontrol Listesi + +Her inceleme aşağıdaki altı yönü kontrol etmelidir. **Bu yönleri sırayla kontrol etmenizi teşvik ediyoruz; böylece resmi gereksinimler karşılanmadığında veya değişikliği kabul etmek için toplulukta fikir birliği olmadığında, ayrıntılı kod kalitesi incelemelerine zaman harcamaktan kaçınmış olursunuz.** + +### 1. Katkı İyi Tanımlanmış mı? + +Katkının, iyi bir incelemeyi desteklemek için yeterince iyi tanımlanıp tanımlanmadığını kontrol edin. Önemsiz değişiklikler ve düzeltmeler uzun bir açıklama gerektirmez. Uygulama tam olarak [Jira'daki veya geliştirme e-posta listesindeki önceki tartışmaya göre]({{< relref "how-to-contribute/contribute-code" >}}#consensus) ise, sadece o tartışmaya kısa bir referans yeterlidir. +Uygulama, fikir birliği tartışmasında üzerinde anlaşılan yaklaşımdan farklıysa, katkının daha fazla incelenmesi için uygulamanın ayrıntılı bir açıklaması gereklidir. + +İşlevselliği veya davranışı değiştiren herhangi bir pull request, bu değişikliklerin büyük resmini açıklamalıdır, böylece incelemeler neye bakacaklarını bilirler (ve değişikliğin ne yaptığını anlamak için kodu incelemek zorunda kalmazlar). + + +**Aşağıdaki 2, 3 ve 4. sorular koda bakmadan cevaplanabiliyorsa katkı iyi tanımlanmıştır.** + +----- + +### 2. Değişikliğin veya Özelliğin Flink'e Girmesi Konusunda Fikir Birliği Var mı? + +Bu soru doğrudan bağlantılı Jira sorunuyla cevaplanabilir. Önceden fikir birliği olmadan oluşturulan pull request'ler için, [fikir birliği aramak için Jira'da bir tartışma]({{< relref "how-to-contribute/contribute-code" >}}) gerekecektir. + + +`[hotfix]` pull request'leri için, pull request'te fikir birliği kontrolü yapılması gerekir. + + +----- + +### 3. Katkı Bazı Belirli Committer'lardan Dikkat Gerektiriyor mu ve Bu Committer'lardan Zaman Taahhüdü Var mı? + +Bazı değişiklikler belirli committer'ların dikkatini ve onayını gerektirir. Örneğin, performansa çok duyarlı olan veya dağıtılmış koordinasyon ve hata toleransı üzerinde kritik bir etkisi olan parçalardaki değişiklikler, bileşene derinlemesine aşina olan bir committer'dan girdiye ihtiyaç duyar. + +Kural olarak, Pull Request açıklaması şablondaki "Bu pull request aşağıdaki parçalardan birini potansiyel olarak etkiliyor mu" bölümündeki sorulardan birine 'evet' ile cevap verdiğinde özel dikkat gereklidir. + +Bu soru şu şekilde cevaplanabilir: + +* *Özel dikkat gerektirmez* +* *X için özel dikkat gerektirir (X, örneğin kontrol noktası oluşturma, jobmanager vb. olabilir).* +* *@committerA, @contributorB tarafından X için özel dikkat var* + +**Pull request özel dikkat gerektiriyorsa, etiketlenen committer'lardan/katkıda bulunanlardan biri nihai onayı vermelidir.** + +---- + +### 4. Uygulama, Üzerinde Anlaşılan Genel Yaklaşımı/Mimariyi Takip Ediyor mu? + +Bu adımda, bir katkının Jira'daki veya e-posta listelerindeki önceki tartışmada üzerinde anlaşılan yaklaşımı takip edip etmediğini kontrol ediyoruz. + +Bu soru mümkün olduğunca Pull Request açıklamasından (veya bağlantılı Jira'dan) cevaplanabilmelidir. + +Değişikliğin bireysel kısımları hakkında yorum yapmak gibi ayrıntılara girmeden önce bunu kontrol etmenizi öneririz. + +---- + +### 5. Genel Kod Kalitesi İyi mi, Flink'te Sürdürmek İstediğimiz Standartları Karşılıyor mu? + +Bu, gerçek değişikliklerin ayrıntılı kod incelemesidir ve şunları kapsar: + +* Değişiklikler Jira biletinde veya tasarım belgesinde açıklanan şeyi yapıyor mu? +* Kod doğru yazılım mühendisliği uygulamalarını takip ediyor mu? Kod doğru, sağlam, bakımı yapılabilir, test edilebilir mi? +* Performansa duyarlı bir kısmı değiştirirken, değişiklikler performans bilinciyle yapılmış mı? +* Değişiklikler testlerle yeterince kapsanmış mı? +* Testler hızlı çalışıyor mu, yani ağır entegrasyon testleri sadece gerektiğinde mi kullanılıyor? +* Kod formatı Flink'in checkstyle desenini takip ediyor mu? +* Kod, ek derleyici uyarıları getirmekten kaçınıyor mu? +* Bağımlılıklar değiştirildiyse, NOTICE dosyaları güncellendi mi? + +Kod yönergeleri [Flink Kod Stili ve Kalite Kılavuzu]({{< relref "how-to-contribute/code-style-and-quality-preamble" >}})'nda bulunabilir. + +---- + +### 6. İngilizce ve Çince Belgeler Güncellendi mi? + +Pull request yeni bir özellik tanıtıyorsa, özellik belgelenmelidir. Flink topluluğu hem İngilizce hem de Çince belgeleri sürdürmektedir. Bu nedenle her iki belge de güncellenmelidir. Çince diline aşina değilseniz, lütfen Çince belge çevirisi için `chinese-translation` bileşenine atanmış bir Jira açın ve bunu mevcut Jira sorunu ile ilişkilendirin. Çince diline aşinaysanız, her iki tarafı da bir pull request'te güncellemeniz teşvik edilir. + +[Belgelere nasıl katkıda bulunulacağı]({{< relref "how-to-contribute/contribute-documentation" >}}) hakkında daha fazla bilgi alın. \ No newline at end of file diff --git a/docs/content.tr/material.md b/docs/content.tr/material.md new file mode 100644 index 0000000000..bb133939a6 --- /dev/null +++ b/docs/content.tr/material.md @@ -0,0 +1,111 @@ +--- +title: Material +bold: true +bookCollapseSection: false +bookHidden: true + +tables: + png: + name: "png" + + cols: + - id: "Colored" + name: "Colored logo" + - id: "WhiteFilled" + name: "White filled logo" + - id: "BlackOutline" + name: "Black outline logo" + + rows: + - Colored: + val: "Apache Flink Logo" + html: true + WhiteFilled: + val: "Apache Flink Logo" + html: true + BlackOutline: + val: "Apache Flink Logo" + html: true + - Colored: "**Sizes (px)** [50x50](/img/logo/png/50/color_50.png), [100x100](/img/logo/png/100/flink_squirrel_100_color.png), [200x200](/img/logo/png/200/flink_squirrel_200_color.png), [500x500](/img/logo/png/500/flink_squirrel_500.png), [1000x1000](/img/logo/png/1000/flink_squirrel_1000.png)" + WhiteFilled: "**Sizes (px)**: [50x50](/img/logo/png/50/white_50.png), [100x100](/img/logo/png/100/flink_squirrel_100_white.png), [200x200](/img/logo/png/200/flink_squirrel_200_white.png), [500x500](/img/logo/png/500/flink_squirrel_500_white.png), [1000x1000](/img/logo/png/1000/flink_squirrel_white_1000.png)

" + BlackOutline: "**Sizes (px)**: [50x50](/img/logo/png/50/black_50.png), [100x100](/img/logo/png/100/flink_squirrel_100_black.png), [200x200](/img/logo/png/200/flink_squirrel_200_black.png), [500x500](/img/logo/png/500/flink_squirrel_500_black.png), [1000x1000](/img/logo/png/1000/flink_squirrel_black_1000.png)" + + svg: + name: "svg" + + cols: + - id: "Colored" + name: "Colored logo" + - id: "WhiteFilled" + name: "White filled logo" + - id: "BlackOutline" + name: "Black outline logo" + + rows: + - Colored: + val: "Apache Flink Logo" + html: true + WhiteFilled: + val: "Apache Flink Logo" + html: true + BlackOutline: + val: "Apache Flink Logo" + html: true + - Colored: "Colored logo with black text ([color_black.svg](/img/logo/svg/color_black.svg))" + WhiteFilled: "White filled logo ([white_filled.svg](/img/logo/svg/white_filled.svg))" + BlackOutline: "Black outline logo ([black_outline.svg](/img/logo/svg/black_outline.svg))" + + +--- + + +# Material + +## Apache Flink Logos + +We provide the Apache Flink logo in different sizes and formats. You can [download all variants](/img/logo.zip) (7.4 MB) or just pick the one you need from this page. + +### Portable Network Graphics (PNG) + +{{< table "png" >}} + +You can find more variants of the logo [in this directory](/img/logo/png) or [download all variants](/img/logo.zip) (7.4 MB). + +### Scalable Vector Graphics (SVG) + +{{< table "svg" >}} + +You can find more variants of the logo [in this directory](/img/logo/svg) or [download all variants](/img/logo.zip) (7.4 MB). + +### Photoshop (PSD) + +You can download the logo in PSD format as well: + +- **Colored logo**: [1000x1000](/img/logo/psd/flink_squirrel_1000.psd). +- **Black outline logo with text**: [1000x1000](/img/logo/psd/flink_1000.psd), [5000x5000](/img/logo/psd/flink_5000.psd). + +You can find more variants of the logo [in this directory](/img/logo/psd) or [download all variants](/img/logo.zip) (7.4 MB). + +## Color Scheme + +You can use the provided color scheme which incorporates some colors of the Flink logo: + +- [PDF color scheme](/img/logo/colors/flink_colors.pdf) +- [Powerpoint color scheme](/img/logo/colors/flink_colors.pptx) diff --git a/docs/content.tr/posts/2014-08-26-release-0.6.md b/docs/content.tr/posts/2014-08-26-release-0.6.md new file mode 100644 index 0000000000..24f5d6de56 --- /dev/null +++ b/docs/content.tr/posts/2014-08-26-release-0.6.md @@ -0,0 +1,80 @@ +--- +date: "2014-08-26T10:00:00Z" +title: Apache Flink 0.6 available +aliases: +- /news/2014/08/26/release-0.6.html +--- + +We are happy to announce the availability of Flink 0.6. This is the +first release of the system inside the Apache Incubator and under the +name Flink. Releases up to 0.5 were under the name Stratosphere, the +academic and open source project that Flink originates from. + +## What is Flink? + +Apache Flink is a general-purpose data processing engine for +clusters. It runs on YARN clusters on top of data stored in Hadoop, as +well as stand-alone. Flink currently has programming APIs in Java and +Scala. Jobs are executed via Flink's own runtime engine. Flink +features: + +**Robust in-memory and out-of-core processing:** once read, data stays + in memory as much as possible, and is gracefully de-staged to disk in + the presence of memory pressure from limited memory or other + applications. The runtime is designed to perform very well both in + setups with abundant memory and in setups where memory is scarce. + +**POJO-based APIs:** when programming, you do not have to pack your + data into key-value pairs or some other framework-specific data + model. Rather, you can use arbitrary Java and Scala types to model + your data. + +**Efficient iterative processing:** Flink contains explicit "iterate" operators + that enable very efficient loops over data sets, e.g., for machine + learning and graph applications. + +**A modular system stack:** Flink is not a direct implementation of its + APIs but a layered system. All programming APIs are translated to an + intermediate program representation that is compiled and optimized + via a cost-based optimizer. Lower-level layers of Flink also expose + programming APIs for extending the system. + +**Data pipelining/streaming:** Flink's runtime is designed as a + pipelined data processing engine rather than a batch processing + engine. Operators do not wait for their predecessors to finish in + order to start processing data. This results to very efficient + handling of large data sets. + +## Release 0.6 + +Flink 0.6 builds on the latest Stratosphere 0.5 release. It includes +many bug fixes and improvements that make the system more stable and +robust, as well as breaking API changes. + +The full release notes are available [here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12327101). + +Download the release [here](http://flink.incubator.apache.org/downloads.html). + +## Contributors + +* Wilson Cao +* Ufuk Celebi +* Stephan Ewen +* Jonathan Hasenburg +* Markus Holzemer +* Fabian Hueske +* Sebastian Kunert +* Vikhyat Korrapati +* Aljoscha Krettek +* Sebastian Kruse +* Raymond Liu +* Robert Metzger +* Mingliang Qi +* Till Rohrmann +* Henry Saputra +* Chesnay Schepler +* Kostas Tzoumas +* Robert Waury +* Timo Walther +* Daniel Warneke +* Tobias Wiens diff --git a/docs/content.tr/posts/2014-09-26-release-0.6.1.md b/docs/content.tr/posts/2014-09-26-release-0.6.1.md new file mode 100644 index 0000000000..8ae5dde4ea --- /dev/null +++ b/docs/content.tr/posts/2014-09-26-release-0.6.1.md @@ -0,0 +1,13 @@ +--- +date: "2014-09-26T10:00:00Z" +title: Apache Flink 0.6.1 available +aliases: +- /news/2014/09/26/release-0.6.1.html +--- + +We are happy to announce the availability of Flink 0.6.1. + +0.6.1 is a maintenance release, which includes minor fixes across several parts +of the system. We suggest all users of Flink to work with this newest version. + +[Download](/downloads.html) the release today. \ No newline at end of file diff --git a/docs/content.tr/posts/2014-10-03-upcoming_events.md b/docs/content.tr/posts/2014-10-03-upcoming_events.md new file mode 100644 index 0000000000..fc9aa175df --- /dev/null +++ b/docs/content.tr/posts/2014-10-03-upcoming_events.md @@ -0,0 +1,87 @@ +--- +date: "2014-10-03T10:00:00Z" +title: Upcoming Events +aliases: +- /news/2014/10/03/upcoming_events.html +--- + +We are happy to announce several upcoming Flink events both in Europe and the US. Starting with a **Flink hackathon in Stockholm** (Oct 8-9) and a talk about Flink at the **Stockholm Hadoop User Group** (Oct 8). This is followed by the very first **Flink Meetup in Berlin** (Oct 15). In the US, there will be two Flink Meetup talks: the first one at the **Pasadena Big Data User Group** (Oct 29) and the second one at **Silicon Valley Hands On Programming Events** (Nov 4). + +We are looking forward to seeing you at any of these events. The following is an overview of each event and links to the respective Meetup pages. + +### Flink Hackathon, Stockholm (Oct 8-9) + +The hackathon will take place at KTH/SICS from Oct 8th-9th. You can sign up here: https://docs.google.com/spreadsheet/viewform?formkey=dDZnMlRtZHJ3Z0hVTlFZVjU2MWtoX0E6MA. + +Here is a rough agenda and a list of topics to work upon or look into. Suggestions and more topics are welcome. + +#### Wednesday (8th) + +9:00 - 10:00 Introduction to Apache Flink, System overview, and Dev +environment (by Stephan) + +10:15 - 11:00 Introduction to the topics (Streaming API and system by Gyula +& Marton), (Graphs by Vasia / Martin / Stephan) + +11:00 - 12:30 Happy hacking (part 1) + +12:30 - Lunch (Food will be provided by KTH / SICS. A big thank you to them +and also to Paris, for organizing that) + +13:xx - Happy hacking (part 2) + +#### Thursday (9th) + +Happy hacking (continued) + + +#### Suggestions for topics + +##### Streaming + + - Sample streaming applications (e.g. continuous heavy hitters and topics +on the twitter stream) + + - Implement a simple SQL to Streaming program parser. Possibly using +Apache Calcite (http://optiq.incubator.apache.org/) + + - Implement different windowing methods (count-based, time-based, ...) + + - Implement different windowed operations (windowed-stream-join, +windowed-stream-co-group) + + - Streaming state, and interaction with other programs (that access state +of a stream program) + +##### Graph Analysis + + - Prototype a Graph DSL (simple graph building, filters, graph +properties, some algorithms) + + - Prototype abstractions different Graph processing paradigms +(vertex-centric, partition-centric). + + - Generalize the delta iterations, allow flexible state access. + +### Meetup: Hadoop User Group Talk, Stockholm (Oct 8) + +Hosted by Spotify, opens at 6 PM. + +http://www.meetup.com/stockholm-hug/events/207323222/ + +### 1st Flink Meetup, Berlin (Oct 15) + +We are happy to announce the first Flink meetup in Berlin. You are very welcome to to sign up and attend. The event will be held in Betahaus Cafe. + +http://www.meetup.com/Apache-Flink-Meetup/events/208227422/ + +### Meetup: Pasadena Big Data User Group (Oct 29) + +http://www.meetup.com/Pasadena-Big-Data-Users-Group/ + +### Meetup: Silicon Valley Hands On Programming Events (Nov 4) + +http://www.meetup.com/HandsOnProgrammingEvents/events/210504392/ + + + diff --git a/docs/content.tr/posts/2014-11-04-release-0.7.0.md b/docs/content.tr/posts/2014-11-04-release-0.7.0.md new file mode 100644 index 0000000000..a8908e04eb --- /dev/null +++ b/docs/content.tr/posts/2014-11-04-release-0.7.0.md @@ -0,0 +1,69 @@ +--- +date: "2014-11-04T10:00:00Z" +title: Apache Flink 0.7.0 available +aliases: +- /news/2014/11/04/release-0.7.0.html +--- + +We are pleased to announce the availability of Flink 0.7.0. This release includes new user-facing features as well as performance and bug fixes, brings the Scala and Java APIs in sync, and introduces Flink Streaming. A total of 34 people have contributed to this release, a big thanks to all of them! + +Download Flink 0.7.0 [here](http://flink.incubator.apache.org/downloads.html) + +See the release changelog [here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12327648) + +## Overview of major new features + +**Flink Streaming:** The gem of the 0.7.0 release is undoubtedly Flink Streaming. Available currently in alpha, Flink Streaming provides a Java API on top of Apache Flink that can consume streaming data sources (e.g., from Apache Kafka, Apache Flume, and others) and process them in real time. A dedicated blog post on Flink Streaming and its performance is coming up here soon. You can check out the Streaming programming guide [here]({{< param DocsBaseUrl >}}flink-docs-release-0.7/streaming_guide.html). + +**New Scala API:** The Scala API has been completely rewritten. The Java and Scala APIs have now the same syntax and transformations and will be kept from now on in sync in every future release. See the new Scala API [here]({{< param DocsBaseUrl >}}flink-docs-release-0.7/programming_guide.html). + +**Logical key expressions:** You can now specify grouping and joining keys with logical names for member variables of POJO data types. For example, you can join two data sets as ``persons.join(cities).where(“zip”).equalTo(“zipcode”)``. Read more [here]({{< param DocsBaseUrl >}}flink-docs-release-0.7/programming_guide.html#specifying-keys). + +**Hadoop MapReduce compatibility:** You can run unmodified Hadoop Mappers and Reducers (mapred API) in Flink, use all Hadoop data types, and read data with all Hadoop InputFormats. + +**Collection-based execution backend:** The collection-based execution backend enables you to execute a Flink job as a simple Java collections program, bypassing completely the Flink runtime and optimizer. This feature is extremely useful for prototyping, and embedding Flink jobs in projects in a very lightweight manner. + +**Record API deprecated:** The (old) Stratosphere Record API has been marked as deprecated and is planned for removal in the 0.9.0 release. + +**BLOB service:** This release contains a new service to distribute jar files and other binary data among the JobManager, TaskManagers and the client. + +**Intermediate data sets:** A major rewrite of the system internals introduces intermediate data sets as first class citizens. The internal state machine that tracks the distributed tasks has also been completely rewritten for scalability. While this is not visible as a user-facing feature yet, it is the foundation for several upcoming exciting features. + +**Note:** Currently, there is limited support for Java 8 lambdas when compiling and running from an IDE. The problem is due to type erasure and whether Java compilers retain type information. We are currently working with the Eclipse and OpenJDK communities to resolve this. + +## Contributors + +* Tamas Ambrus +* Mariem Ayadi +* Marton Balassi +* Daniel Bali +* Ufuk Celebi +* Hung Chang +* David Eszes +* Stephan Ewen +* Judit Feher +* Gyula Fora +* Gabor Hermann +* Fabian Hueske +* Vasiliki Kalavri +* Kristof Kovacs +* Aljoscha Krettek +* Sebastian Kruse +* Sebastian Kunert +* Matyas Manninger +* Robert Metzger +* Mingliang Qi +* Till Rohrmann +* Henry Saputra +* Chesnay Schelper +* Moritz Schubotz +* Hung Sendoh Chang +* Peter Szabo +* Jonas Traub +* Fabian Tschirschnitz +* Artem Tsikiridis +* Kostas Tzoumas +* Timo Walther +* Daniel Warneke +* Tobias Wiens +* Yingjun Wu \ No newline at end of file diff --git a/docs/content.tr/posts/2014-11-18-hadoop-compatibility.md b/docs/content.tr/posts/2014-11-18-hadoop-compatibility.md new file mode 100644 index 0000000000..8ba28339b9 --- /dev/null +++ b/docs/content.tr/posts/2014-11-18-hadoop-compatibility.md @@ -0,0 +1,90 @@ +--- +author: Fabian Hüske +author-twitter: fhueske +date: "2014-11-18T10:00:00Z" +title: Hadoop Compatibility in Flink +aliases: +- /news/2014/11/18/hadoop-compatibility.html +--- + +[Apache Hadoop](http://hadoop.apache.org) is an industry standard for scalable analytical data processing. Many data analysis applications have been implemented as Hadoop MapReduce jobs and run in clusters around the world. Apache Flink can be an alternative to MapReduce and improves it in many dimensions. Among other features, Flink provides much better performance and offers APIs in Java and Scala, which are very easy to use. Similar to Hadoop, Flink’s APIs provide interfaces for Mapper and Reducer functions, as well as Input- and OutputFormats along with many more operators. While being conceptually equivalent, Hadoop’s MapReduce and Flink’s interfaces for these functions are unfortunately not source compatible. + +## Flink’s Hadoop Compatibility Package + +
+ +
+ +To close this gap, Flink provides a Hadoop Compatibility package to wrap functions implemented against Hadoop’s MapReduce interfaces and embed them in Flink programs. This package was developed as part of a [Google Summer of Code](https://developers.google.com/open-source/soc/) 2014 project. + +With the Hadoop Compatibility package, you can reuse all your Hadoop + +* ``InputFormats`` (mapred and mapreduce APIs) +* ``OutputFormats`` (mapred and mapreduce APIs) +* ``Mappers`` (mapred API) +* ``Reducers`` (mapred API) + +in Flink programs without changing a line of code. Moreover, Flink also natively supports all Hadoop data types (``Writables`` and ``WritableComparable``). + +The following code snippet shows a simple Flink WordCount program that solely uses Hadoop data types, InputFormat, OutputFormat, Mapper, and Reducer functions. + +```java + +// Definition of Hadoop Mapper function +public class Tokenizer implements Mapper { ... } +// Definition of Hadoop Reducer function +public class Counter implements Reducer { ... } + +public static void main(String[] args) { + final String inputPath = args[0]; + final String outputPath = args[1]; + + final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + + // Setup Hadoop’s TextInputFormat + HadoopInputFormat hadoopInputFormat = + new HadoopInputFormat( + new TextInputFormat(), LongWritable.class, Text.class, new JobConf()); + TextInputFormat.addInputPath(hadoopInputFormat.getJobConf(), new Path(inputPath)); + + // Read a DataSet with the Hadoop InputFormat + DataSet> text = env.createInput(hadoopInputFormat); + DataSet> words = text + // Wrap Tokenizer Mapper function + .flatMap(new HadoopMapFunction(new Tokenizer())) + .groupBy(0) + // Wrap Counter Reducer function (used as Reducer and Combiner) + .reduceGroup(new HadoopReduceCombineFunction( + new Counter(), new Counter())); + + // Setup Hadoop’s TextOutputFormat + HadoopOutputFormat hadoopOutputFormat = + new HadoopOutputFormat( + new TextOutputFormat(), new JobConf()); + hadoopOutputFormat.getJobConf().set("mapred.textoutputformat.separator", " "); + TextOutputFormat.setOutputPath(hadoopOutputFormat.getJobConf(), new Path(outputPath)); + + // Output & Execute + words.output(hadoopOutputFormat); + env.execute("Hadoop Compat WordCount"); +} + +``` + +As you can see, Flink represents Hadoop key-value pairs as `Tuple2` tuples. Note, that the program uses Flink’s `groupBy()` transformation to group data on the key field (field 0 of the `Tuple2`) before it is given to the Reducer function. At the moment, the compatibility package does not evaluate custom Hadoop partitioners, sorting comparators, or grouping comparators. + +Hadoop functions can be used at any position within a Flink program and of course also be mixed with native Flink functions. This means that instead of assembling a workflow of Hadoop jobs in an external driver method or using a workflow scheduler such as [Apache Oozie](http://oozie.apache.org), you can implement an arbitrary complex Flink program consisting of multiple Hadoop Input- and OutputFormats, Mapper and Reducer functions. When executing such a Flink program, data will be pipelined between your Hadoop functions and will not be written to HDFS just for the purpose of data exchange. + +
+ +
+ +## What comes next? + +While the Hadoop compatibility package is already very useful, we are currently working on a dedicated Hadoop Job operation to embed and execute Hadoop jobs as a whole in Flink programs, including their custom partitioning, sorting, and grouping code. With this feature, you will be able to chain multiple Hadoop jobs, mix them with Flink functions, and other operations such as [Spargel]({{< param DocsBaseUrl >}}flink-docs-release-0.7/spargel_guide.html) operations (Pregel/Giraph-style jobs). + +## Summary + +Flink lets you reuse a lot of the code you wrote for Hadoop MapReduce, including all data types, all Input- and OutputFormats, and Mapper and Reducers of the mapred-API. Hadoop functions can be used within Flink programs and mixed with all other Flink functions. Due to Flink’s pipelined execution, Hadoop functions can arbitrarily be assembled without data exchange via HDFS. Moreover, the Flink community is currently working on a dedicated Hadoop Job operation to supporting the execution of Hadoop jobs as a whole. + +If you want to use Flink’s Hadoop compatibility package checkout our [documentation]({{< param DocsBaseUrl >}}flink-docs-master/apis/batch/hadoop_compatibility.html). diff --git a/docs/content.tr/posts/2015-01-06-december-in-flink.md b/docs/content.tr/posts/2015-01-06-december-in-flink.md new file mode 100644 index 0000000000..4b4e68690d --- /dev/null +++ b/docs/content.tr/posts/2015-01-06-december-in-flink.md @@ -0,0 +1,62 @@ +--- +date: "2015-01-06T10:00:00Z" +title: December 2014 in the Flink community +aliases: +- /news/2015/01/06/december-in-flink.html +--- + +This is the first blog post of a “newsletter” like series where we give a summary of the monthly activity in the Flink community. As the Flink project grows, this can serve as a "tl;dr" for people that are not following the Flink dev and user mailing lists, or those that are simply overwhelmed by the traffic. + + +### Flink graduation + +The biggest news is that the Apache board approved Flink as a top-level Apache project! The Flink team is working closely with the Apache press team for an official announcement, so stay tuned for details! + +### New Flink website + +The [Flink website](http://flink.apache.org) got a total make-over, both in terms of appearance and content. + +### Flink IRC channel + +A new IRC channel called #flink was created at irc.freenode.org. An easy way to access the IRC channel is through the [web client](http://webchat.freenode.net/). Feel free to stop by to ask anything or share your ideas about Apache Flink! + +### Meetups and Talks + +Apache Flink was presented in the [Amsterdam Hadoop User Group](http://www.meetup.com/Netherlands-Hadoop-User-Group/events/218635152) + +## Notable code contributions + +**Note:** Code contributions listed here may not be part of a release or even the current snapshot yet. + +### [Streaming Scala API](https://github.com/apache/incubator-flink/pull/275) + +The Flink Streaming Java API recently got its Scala counterpart. Once merged, Flink Streaming users can use both Scala and Java for their development. The Flink Streaming Scala API is built as a thin layer on top of the Java API, making sure that the APIs are kept easily in sync. + +### [Intermediate datasets](https://github.com/apache/incubator-flink/pull/254) + +This pull request introduces a major change in the Flink runtime. Currently, the Flink runtime is based on the notion of operators that exchange data through channels. With the PR, intermediate data sets that are produced by operators become first-class citizens in the runtime. While this does not have any user-facing impact yet, it lays the groundwork for a slew of future features such as blocking execution, fine-grained fault-tolerance, and more efficient data sharing between cluster and client. + +### [Configurable execution mode](https://github.com/apache/incubator-flink/pull/259) + +This pull request allows the user to change the object-reuse behaviour. Before this pull request, some operations would reuse objects passed to the user function while others would always create new objects. This introduces a system wide switch and changes all operators to either reuse objects or don’t reuse objects. + +### [Distributed Coordination via Akka](https://github.com/apache/incubator-flink/pull/149) + +Another major change is a complete rewrite of the JobManager / TaskManager components in Scala. In addition to that, the old RPC service was replaced by Actors, using the Akka framework. + +### [Sorting of very large records](https://github.com/apache/incubator-flink/pull/249 ) + +Flink's internal sort-algorithms were improved to better handle large records (multiple 100s of megabytes or larger). Previously, the system did in some cases hold instances of multiple large records, resulting in high memory consumption and JVM heap thrashing. Through this fix, large records are streamed through the operators, reducing the memory consumption and GC pressure. The system now requires much less memory to support algorithms that work on such large records. + +### [Kryo Serialization as the new default fallback](https://github.com/apache/incubator-flink/pull/271) + +Flink’s build-in type serialization framework is handles all common types very efficiently. Prior versions uses Avro to serialize types that the built-in framework could not handle. +Flink serialization system improved a lot over time and by now surpasses the capabilities of Avro in many cases. Kryo now serves as the default fallback serialization framework, supporting a much broader range of types. + +### [Hadoop FileSystem support](https://github.com/apache/incubator-flink/pull/268) + +This change permits users to use all file systems supported by Hadoop with Flink. In practice this means that users can use Flink with Tachyon, Google Cloud Storage (also out of the box Flink YARN support on Google Compute Cloud), FTP and all the other file system implementations for Hadoop. + +## Heading to the 0.8.0 release + +The community is working hard together with the Apache infra team to migrate the Flink infrastructure to a top-level project. At the same time, the Flink community is working on the Flink 0.8.0 release which should be out very soon. \ No newline at end of file diff --git a/docs/content.tr/posts/2015-01-21-release-0.8.md b/docs/content.tr/posts/2015-01-21-release-0.8.md new file mode 100644 index 0000000000..735aa4a271 --- /dev/null +++ b/docs/content.tr/posts/2015-01-21-release-0.8.md @@ -0,0 +1,76 @@ +--- +date: "2015-01-21T10:00:00Z" +title: Apache Flink 0.8.0 available +aliases: +- /news/2015/01/21/release-0.8.html +--- + + +We are pleased to announce the availability of Flink 0.8.0. This release includes new user-facing features as well as performance and bug fixes, extends the support for filesystems and introduces the Scala API and flexible windowing semantics for Flink Streaming. A total of 33 people have contributed to this release, a big thanks to all of them! + +[Download Flink 0.8.0](http://www.apache.org/dyn/closer.cgi/flink/flink-0.8.0/flink-0.8.0-bin-hadoop2.tgz) + +[See the release changelog](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12328699) + +## Overview of major new features + + + - **Extended filesystem support**: The former `DistributedFileSystem` interface has been generalized to `HadoopFileSystem` now supporting all sub classes of `org.apache.hadoop.fs.FileSystem`. This allows users to use all file systems supported by Hadoop with Apache Flink. +[See connecting to other systems]({{< param DocsBaseUrl >}}flink-docs-release-0.8/example_connectors.html) + + - **Streaming Scala API**: As an alternative to the existing Java API Streaming is now also programmable in Scala. The Java and Scala APIs have now the same syntax and transformations and will be kept from now on in sync in every future release. + + - **Streaming windowing semantics**: The new windowing api offers an expressive way to define custom logic for triggering the execution of a stream window and removing elements. The new features include out-of-the-box support for windows based in logical or physical time and data-driven properties on the events themselves among others. [Read more here]({{< param DocsBaseUrl >}}flink-docs-release-0.8/streaming_guide.html#window-operators) + + - **Mutable and immutable objects in runtime** All Flink versions before 0.8.0 were always passing the same objects to functions written by users. This is a common performance optimization, also used in other systems such as Hadoop. + However, this is error-prone for new users because one has to carefully check that references to the object aren’t kept in the user function. Starting from 0.8.0, Flink allows to configure a mode which is disabling that mechanism. + + - **Performance and usability improvements**: The new Apache Flink 0.8.0 release brings several new features which will significantly improve the performance and the usability of the system. Amongst others, these features include: + - Improved input split assignment which maximizes computation locality + - Smart broadcasting mechanism which minimizes network I/O + - Custom partitioners which let the user control how the data is partitioned within the cluster. This helps to prevent data skewness and allows to implement highly efficient algorithms. + - coGroup operator now supports group sorting for its inputs + + - **Kryo is the new fallback serializer**: Apache Flink has a sophisticated type analysis and serialization framework that is able to handle commonly used types very efficiently. + In addition to that, there is a fallback serializer for types which are not supported. Older versions of Flink used the reflective [Avro](http://avro.apache.org/) serializer for that purpose. With this release, Flink is using the powerful [Kryo](https://github.com/EsotericSoftware/kryo) and twitter-chill library for support of types such as Java Collections and Scala specifc types. + + - **Hadoop 2.2.0+ is now the default Hadoop dependency**: With Flink 0.8.0 we made the “hadoop2” build profile the default build for Flink. This means that all users using Hadoop 1 (0.2X or 1.2.X versions) have to specify version “0.8.0-hadoop1” in their pom files. + + - **HBase module updated** The HBase version has been updated to 0.98.6.1. Also, Hbase is now available to the Hadoop1 and Hadoop2 profile of Flink. + + +## Contributors + + - Marton Balassi + - Daniel Bali + - Carsten Brandt + - Moritz Borgmann + - Stefan Bunk + - Paris Carbone + - Ufuk Celebi + - Nils Engelbach + - Stephan Ewen + - Gyula Fora + - Gabor Hermann + - Fabian Hueske + - Vasiliki Kalavri + - Johannes Kirschnick + - Aljoscha Krettek + - Suneel Marthi + - Robert Metzger + - Felix Neutatz + - Chiwan Park + - Flavio Pompermaier + - Mingliang Qi + - Shiva Teja Reddy + - Till Rohrmann + - Henry Saputra + - Kousuke Saruta + - Chesney Schepler + - Erich Schubert + - Peter Szabo + - Jonas Traub + - Kostas Tzoumas + - Timo Walther + - Daniel Warneke + - Chen Xu \ No newline at end of file diff --git a/docs/content.tr/posts/2015-02-04-january-in-flink.md b/docs/content.tr/posts/2015-02-04-january-in-flink.md new file mode 100644 index 0000000000..fe406f85d5 --- /dev/null +++ b/docs/content.tr/posts/2015-02-04-january-in-flink.md @@ -0,0 +1,48 @@ +--- +date: "2015-02-04T10:00:00Z" +title: January 2015 in the Flink community +aliases: +- /news/2015/02/04/january-in-flink.html +--- + +Happy 2015! Here is a (hopefully digestible) summary of what happened last month in the Flink community. + +### 0.8.0 release + +Flink 0.8.0 was released. See [here](http://flink.apache.org/news/2015/01/21/release-0.8.html) for the release notes. + +### Flink roadmap + +The community has published a [roadmap for 2015](https://cwiki.apache.org/confluence/display/FLINK/Flink+Roadmap) on the Flink wiki. Check it out to see what is coming up in Flink, and pick up an issue to contribute! + +### Articles in the press + +The Apache Software Foundation [announced](https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces69) Flink as a Top-Level Project. The announcement was picked up by the media, e.g., [here](http://sdtimes.com/inside-apache-software-foundations-newest-top-level-project-apache-flink/?utm_content=11232092&utm_medium=social&utm_source=twitter), [here](http://www.datanami.com/2015/01/12/apache-flink-takes-route-distributed-data-processing/), and [here](http://i-programmer.info/news/197-data-mining/8176-flink-reaches-top-level-status.html). + +### Hadoop Summit + +A submitted abstract on Flink Streaming won the community vote at “The Future of Hadoop” track. + +### Meetups and talks + +Flink was presented at the [Paris Hadoop User Group](http://www.meetup.com/Hadoop-User-Group-France/events/219778022/), the [Bay Area Hadoop User Group](http://www.meetup.com/hadoop/events/167785202/), the [Apache Tez User Group](http://www.meetup.com/Apache-Tez-User-Group/events/219302692/), and [FOSDEM 2015](https://fosdem.org/2015/schedule/track/graph_processing/). The January [Flink meetup in Berlin](http://www.meetup.com/Apache-Flink-Meetup/events/219639984/) had talks on recent community updates and new features. + +## Notable code contributions + +**Note:** Code contributions listed here may not be part of a release or even the Flink master repository yet. + +### [Using off-heap memory](https://github.com/apache/flink/pull/290) + +This pull request enables Flink to use off-heap memory for its internal memory uses (sort, hash, caching of intermediate data sets). + +### [Gelly, Flink’s Graph API](https://github.com/apache/flink/pull/335) + +This pull request introduces Gelly, Flink’s brand new Graph API. Gelly offers a native graph programming abstraction with functionality for vertex-centric programming, as well as available graph algorithms. See [this slide set](http://www.slideshare.net/vkalavri/largescale-graph-processing-with-apache-flink-graphdevroom-fosdem15) for an overview of Gelly. + +### [Semantic annotations](https://github.com/apache/flink/pull/311) + +Semantic annotations are a powerful mechanism to expose information about the behavior of Flink functions to Flink’s optimizer. The optimizer can leverage this information to generate more efficient execution plans. For example the output of a Reduce operator that groups on the second field of a tuple is still partitioned on that field if the Reduce function does not modify the value of the second field. By exposing this information to the optimizer, the optimizer can generate plans that avoid expensive data shuffling and reuse the partitioned output of Reduce. Semantic annotations can be defined for most data types, including (nested) tuples and POJOs. See the snapshot documentation for details (not online yet). + +### [New YARN client](https://github.com/apache/flink/pull/292) + +The improved YARN client of Flink now allows users to deploy Flink on YARN for executing a single job. Older versions only supported a long-running YARN session. The code of the YARN client has been refactored to provide an (internal) Java API for controlling YARN clusters more easily. diff --git a/docs/content.tr/posts/2015-02-09-streaming-example.md b/docs/content.tr/posts/2015-02-09-streaming-example.md new file mode 100644 index 0000000000..5a809af276 --- /dev/null +++ b/docs/content.tr/posts/2015-02-09-streaming-example.md @@ -0,0 +1,681 @@ +--- +date: "2015-02-09T12:00:00Z" +title: Introducing Flink Streaming +aliases: +- /news/2015/02/09/streaming-example.html +--- + +This post is the first of a series of blog posts on Flink Streaming, +the recent addition to Apache Flink that makes it possible to analyze +continuous data sources in addition to static files. Flink Streaming +uses the pipelined Flink engine to process data streams in real time +and offers a new API including definition of flexible windows. + +In this post, we go through an example that uses the Flink Streaming +API to compute statistics on stock market data that arrive +continuously and combine the stock market data with Twitter streams. +See the [Streaming Programming +Guide]({{< param DocsBaseUrl >}}flink-docs-master/apis/streaming/index.html) for a +detailed presentation of the Streaming API. + +First, we read a bunch of stock price streams and combine them into +one stream of market data. We apply several transformations on this +market data stream, like rolling aggregations per stock. Then we emit +price warning alerts when the prices are rapidly changing. Moving +towards more advanced features, we compute rolling correlations +between the market data streams and a Twitter stream with stock mentions. + +For running the example implementation please use the *0.9-SNAPSHOT* +version of Flink as a dependency. The full example code base can be +found [here](https://github.com/mbalassi/flink/blob/stockprices/flink-staging/flink-streaming/flink-streaming-examples/src/main/scala/org/apache/flink/streaming/scala/examples/windowing/StockPrices.scala) in Scala and [here](https://github.com/mbalassi/flink/blob/stockprices/flink-staging/flink-streaming/flink-streaming-examples/src/main/java/org/apache/flink/streaming/examples/windowing/StockPrices.java) in Java7. + + + + + +[Back to top](#top) + +Reading from multiple inputs +--------------- + +First, let us create the stream of stock prices: + +1. Read a socket stream of stock prices +1. Parse the text in the stream to create a stream of `StockPrice` objects +1. Add four other sources tagged with the stock symbol. +1. Finally, merge the streams to create a unified stream. + +Reading from multiple inputs + +
+
+{{< highlight scala >}} +def main(args: Array[String]) { + + val env = StreamExecutionEnvironment.getExecutionEnvironment + + //Read from a socket stream at map it to StockPrice objects + val socketStockStream = env.socketTextStream("localhost", 9999).map(x => { + val split = x.split(",") + StockPrice(split(0), split(1).toDouble) + }) + + //Generate other stock streams + val SPX_Stream = env.addSource(generateStock("SPX")(10) _) + val FTSE_Stream = env.addSource(generateStock("FTSE")(20) _) + val DJI_Stream = env.addSource(generateStock("DJI")(30) _) + val BUX_Stream = env.addSource(generateStock("BUX")(40) _) + + //Merge all stock streams together + val stockStream = socketStockStream.merge(SPX_Stream, FTSE_Stream, + DJI_Stream, BUX_Stream) + + stockStream.print() + + env.execute("Stock stream") +} +{{< / highlight >}} +
+
+{{< highlight java >}} +public static void main(String[] args) throws Exception { + + final StreamExecutionEnvironment env = + StreamExecutionEnvironment.getExecutionEnvironment(); + + //Read from a socket stream at map it to StockPrice objects + DataStream socketStockStream = env + .socketTextStream("localhost", 9999) + .map(new MapFunction() { + private String[] tokens; + + @Override + public StockPrice map(String value) throws Exception { + tokens = value.split(","); + return new StockPrice(tokens[0], + Double.parseDouble(tokens[1])); + } + }); + + //Generate other stock streams + DataStream SPX_stream = env.addSource(new StockSource("SPX", 10)); + DataStream FTSE_stream = env.addSource(new StockSource("FTSE", 20)); + DataStream DJI_stream = env.addSource(new StockSource("DJI", 30)); + DataStream BUX_stream = env.addSource(new StockSource("BUX", 40)); + + //Merge all stock streams together + DataStream stockStream = socketStockStream + .merge(SPX_stream, FTSE_stream, DJI_stream, BUX_stream); + + stockStream.print(); + + env.execute("Stock stream"); + {{< / highlight >}} +
+
+ +See +[here]({{< param DocsBaseUrl >}}flink-docs-master/apis/streaming/index.html#data-sources) +on how you can create streaming sources for Flink Streaming +programs. Flink, of course, has support for reading in streams from +[external +sources]({{< param DocsBaseUrl >}}flink-docs-master/apis/streaming/connectors/index.html) +such as Apache Kafka, Apache Flume, RabbitMQ, and others. For the sake +of this example, the data streams are simply generated using the +`generateStock` method: + + +
+
+{{< highlight scala >}} +val symbols = List("SPX", "FTSE", "DJI", "DJT", "BUX", "DAX", "GOOG") + +case class StockPrice(symbol: String, price: Double) + +def generateStock(symbol: String)(sigma: Int)(out: Collector[StockPrice]) = { + var price = 1000. + while (true) { + price = price + Random.nextGaussian * sigma + out.collect(StockPrice(symbol, price)) + Thread.sleep(Random.nextInt(200)) + } +} +{{< / highlight >}} +
+
+{{< highlight java >}} +private static final ArrayList SYMBOLS = new ArrayList( + Arrays.asList("SPX", "FTSE", "DJI", "DJT", "BUX", "DAX", "GOOG")); + +public static class StockPrice implements Serializable { + + public String symbol; + public Double price; + + public StockPrice() { + } + + public StockPrice(String symbol, Double price) { + this.symbol = symbol; + this.price = price; + } + + @Override + public String toString() { + return "StockPrice{" + + "symbol='" + symbol + '\'' + + ", count=" + price + + '}'; + } +} + +public final static class StockSource implements SourceFunction { + + private Double price; + private String symbol; + private Integer sigma; + + public StockSource(String symbol, Integer sigma) { + this.symbol = symbol; + this.sigma = sigma; + } + + @Override + public void invoke(Collector collector) throws Exception { + price = DEFAULT_PRICE; + Random random = new Random(); + + while (true) { + price = price + random.nextGaussian() * sigma; + collector.collect(new StockPrice(symbol, price)); + Thread.sleep(random.nextInt(200)); + } + } +} +{{< / highlight >}} +
+
+ +To read from the text socket stream please make sure that you have a +socket running. For the sake of the example executing the following +command in a terminal does the job. You can get +[netcat](http://netcat.sourceforge.net/) here if it is not available +on your machine. + +``` +nc -lk 9999 +``` + +If we execute the program from our IDE we see the system the +stock prices being generated: + +``` +INFO Job execution switched to status RUNNING. +INFO Socket Stream(1/1) switched to SCHEDULED +INFO Socket Stream(1/1) switched to DEPLOYING +INFO Custom Source(1/1) switched to SCHEDULED +INFO Custom Source(1/1) switched to DEPLOYING +… +1> StockPrice{symbol='SPX', count=1011.3405732645239} +2> StockPrice{symbol='SPX', count=1018.3381290039248} +1> StockPrice{symbol='DJI', count=1036.7454894073978} +3> StockPrice{symbol='DJI', count=1135.1170217478427} +3> StockPrice{symbol='BUX', count=1053.667523187687} +4> StockPrice{symbol='BUX', count=1036.552601487263} +``` + +[Back to top](#top) + +Window aggregations +--------------- + +We first compute aggregations on time-based windows of the +data. Flink provides [flexible windowing semantics]({{< param DocsBaseUrl >}}flink-docs-master/apis/streaming/windows.html) where windows can +also be defined based on count of records or any custom user defined +logic. + +We partition our stream into windows of 10 seconds and slide the +window every 5 seconds. We compute three statistics every 5 seconds. +The first is the minimum price of all stocks, the second produces +maximum price per stock, and the third is the mean stock price +(using a map window function). Aggregations and groupings can be +performed on named fields of POJOs, making the code more readable. + +Basic windowing aggregations + +
+ +
+ +{{< highlight scala >}} +//Define the desired time window +val windowedStream = stockStream + .window(Time.of(10, SECONDS)).every(Time.of(5, SECONDS)) + +//Compute some simple statistics on a rolling window +val lowest = windowedStream.minBy("price") +val maxByStock = windowedStream.groupBy("symbol").maxBy("price") +val rollingMean = windowedStream.groupBy("symbol").mapWindow(mean _) + +//Compute the mean of a window +def mean(ts: Iterable[StockPrice], out: Collector[StockPrice]) = { + if (ts.nonEmpty) { + out.collect(StockPrice(ts.head.symbol, ts.foldLeft(0: Double)(_ + _.price) / ts.size)) + } +} +{{< / highlight >}} + +
+ +
+ +{{< highlight java >}} +//Define the desired time window +WindowedDataStream windowedStream = stockStream + .window(Time.of(10, TimeUnit.SECONDS)) + .every(Time.of(5, TimeUnit.SECONDS)); + +//Compute some simple statistics on a rolling window +DataStream lowest = windowedStream.minBy("price").flatten(); +DataStream maxByStock = windowedStream.groupBy("symbol") + .maxBy("price").flatten(); +DataStream rollingMean = windowedStream.groupBy("symbol") + .mapWindow(new WindowMean()).flatten(); + +//Compute the mean of a window +public final static class WindowMean implements + WindowMapFunction { + + private Double sum = 0.0; + private Integer count = 0; + private String symbol = ""; + + @Override + public void mapWindow(Iterable values, Collector out) + throws Exception { + + if (values.iterator().hasNext()) {s + for (StockPrice sp : values) { + sum += sp.price; + symbol = sp.symbol; + count++; + } + out.collect(new StockPrice(symbol, sum / count)); + } + } +} +{{< / highlight >}} + +
+ +
+ +Let us note that to print a windowed stream one has to flatten it first, +thus getting rid of the windowing logic. For example execute +`maxByStock.flatten().print()` to print the stream of maximum prices of + the time windows by stock. For Scala `flatten()` is called implicitly +when needed. + +[Back to top](#top) + +Data-driven windows +--------------- + +The most interesting event in the stream is when the price of a stock +is changing rapidly. We can send a warning when a stock price changes +more than 5% since the last warning. To do that, we use a delta-based window providing a +threshold on when the computation will be triggered, a function to +compute the difference and a default value with which the first record +is compared. We also create a `Count` data type to count the warnings +every 30 seconds. + +Data-driven windowing semantics + +
+ +
+ +{{< highlight scala >}} +case class Count(symbol: String, count: Int) +val defaultPrice = StockPrice("", 1000) + +//Use delta policy to create price change warnings +val priceWarnings = stockStream.groupBy("symbol") + .window(Delta.of(0.05, priceChange, defaultPrice)) + .mapWindow(sendWarning _) + +//Count the number of warnings every half a minute +val warningsPerStock = priceWarnings.map(Count(_, 1)) + .groupBy("symbol") + .window(Time.of(30, SECONDS)) + .sum("count") + +def priceChange(p1: StockPrice, p2: StockPrice): Double = { + Math.abs(p1.price / p2.price - 1) +} + +def sendWarning(ts: Iterable[StockPrice], out: Collector[String]) = { + if (ts.nonEmpty) out.collect(ts.head.symbol) +} + +{{< / highlight >}} + +
+ +
+ +{{< highlight java >}} + +private static final Double DEFAULT_PRICE = 1000.; +private static final StockPrice DEFAULT_STOCK_PRICE = new StockPrice("", DEFAULT_PRICE); + +//Use delta policy to create price change warnings +DataStream priceWarnings = stockStream.groupBy("symbol") + .window(Delta.of(0.05, new DeltaFunction() { + @Override + public double getDelta(StockPrice oldDataPoint, StockPrice newDataPoint) { + return Math.abs(oldDataPoint.price - newDataPoint.price); + } + }, DEFAULT_STOCK_PRICE)) +.mapWindow(new SendWarning()).flatten(); + +//Count the number of warnings every half a minute +DataStream warningsPerStock = priceWarnings.map(new MapFunction() { + @Override + public Count map(String value) throws Exception { + return new Count(value, 1); + } +}).groupBy("symbol").window(Time.of(30, TimeUnit.SECONDS)).sum("count").flatten(); + +public static class Count implements Serializable { + public String symbol; + public Integer count; + + public Count() { + } + + public Count(String symbol, Integer count) { + this.symbol = symbol; + this.count = count; + } + + @Override + public String toString() { + return "Count{" + + "symbol='" + symbol + '\'' + + ", count=" + count + + '}'; + } +} + +public static final class SendWarning implements MapWindowFunction { + @Override + public void mapWindow(Iterable values, Collector out) + throws Exception { + + if (values.iterator().hasNext()) { + out.collect(values.iterator().next().symbol); + } + } +} + +{{< / highlight >}} + +
+ +
+ +[Back to top](#top) + +Combining with a Twitter stream +--------------- + +Next, we will read a Twitter stream and correlate it with our stock +price stream. Flink has support for connecting to [Twitter's +API]({{< param DocsBaseUrl >}}flink-docs-master/apis/streaming/connectors/twitter.html) +but for the sake of this example we generate dummy tweet data. + +Social media analytics + +
+ +
+ + +{{< highlight scala >}} +//Read a stream of tweets +val tweetStream = env.addSource(generateTweets _) + +//Extract the stock symbols +val mentionedSymbols = tweetStream.flatMap(tweet => tweet.split(" ")) + .map(_.toUpperCase()) + .filter(symbols.contains(_)) + +//Count the extracted symbols +val tweetsPerStock = mentionedSymbols.map(Count(_, 1)) + .groupBy("symbol") + .window(Time.of(30, SECONDS)) + .sum("count") + +def generateTweets(out: Collector[String]) = { + while (true) { + val s = for (i <- 1 to 3) yield (symbols(Random.nextInt(symbols.size))) + out.collect(s.mkString(" ")) + Thread.sleep(Random.nextInt(500)) + } +} +{{< / highlight >}} + +
+ +
+ +{{< highlight java >}} +//Read a stream of tweets +DataStream tweetStream = env.addSource(new TweetSource()); + +//Extract the stock symbols +DataStream mentionedSymbols = tweetStream.flatMap( + new FlatMapFunction() { + @Override + public void flatMap(String value, Collector out) throws Exception { + String[] words = value.split(" "); + for (String word : words) { + out.collect(word.toUpperCase()); + } + } +}).filter(new FilterFunction() { + @Override + public boolean filter(String value) throws Exception { + return SYMBOLS.contains(value); + } +}); + +//Count the extracted symbols +DataStream tweetsPerStock = mentionedSymbols.map(new MapFunction() { + @Override + public Count map(String value) throws Exception { + return new Count(value, 1); + } +}).groupBy("symbol").window(Time.of(30, TimeUnit.SECONDS)).sum("count").flatten(); + +public static final class TweetSource implements SourceFunction { + Random random; + StringBuilder stringBuilder; + + @Override + public void invoke(Collector collector) throws Exception { + random = new Random(); + stringBuilder = new StringBuilder(); + + while (true) { + stringBuilder.setLength(0); + for (int i = 0; i < 3; i++) { + stringBuilder.append(" "); + stringBuilder.append(SYMBOLS.get(random.nextInt(SYMBOLS.size()))); + } + collector.collect(stringBuilder.toString()); + Thread.sleep(500); + } + + } +} + +{{< / highlight >}} + +
+ +
+ +[Back to top](#top) + +Streaming joins +--------------- + +Finally, we join real-time tweets and stock prices and compute a +rolling correlation between the number of price warnings and the +number of mentions of a given stock in the Twitter stream. As both of +these data streams are potentially infinite, we apply the join on a +30-second window. + +Streaming joins + +
+ +
+ + +{{< highlight scala >}} + +//Join warnings and parsed tweets +val tweetsAndWarning = warningsPerStock.join(tweetsPerStock) + .onWindow(30, SECONDS) + .where("symbol") + .equalTo("symbol") { (c1, c2) => (c1.count, c2.count) } + +val rollingCorrelation = tweetsAndWarning.window(Time.of(30, SECONDS)) + .mapWindow(computeCorrelation _) + +rollingCorrelation print + +//Compute rolling correlation +def computeCorrelation(input: Iterable[(Int, Int)], out: Collector[Double]) = { + if (input.nonEmpty) { + val var1 = input.map(_._1) + val mean1 = average(var1) + val var2 = input.map(_._2) + val mean2 = average(var2) + + val cov = average(var1.zip(var2).map(xy => (xy._1 - mean1) * (xy._2 - mean2))) + val d1 = Math.sqrt(average(var1.map(x => Math.pow((x - mean1), 2)))) + val d2 = Math.sqrt(average(var2.map(x => Math.pow((x - mean2), 2)))) + + out.collect(cov / (d1 * d2)) + } +} + +{{< / highlight >}} + +
+ +
+ +{{< highlight java >}} + +//Join warnings and parsed tweets +DataStream> tweetsAndWarning = warningsPerStock + .join(tweetsPerStock) + .onWindow(30, TimeUnit.SECONDS) + .where("symbol") + .equalTo("symbol") + .with(new JoinFunction>() { + @Override + public Tuple2 join(Count first, Count second) throws Exception { + return new Tuple2(first.count, second.count); + } + }); + +//Compute rolling correlation +DataStream rollingCorrelation = tweetsAndWarning + .window(Time.of(30, TimeUnit.SECONDS)) + .mapWindow(new WindowCorrelation()); + +rollingCorrelation.print(); + +public static final class WindowCorrelation + implements WindowMapFunction, Double> { + + private Integer leftSum; + private Integer rightSum; + private Integer count; + + private Double leftMean; + private Double rightMean; + + private Double cov; + private Double leftSd; + private Double rightSd; + + @Override + public void mapWindow(Iterable> values, Collector out) + throws Exception { + + leftSum = 0; + rightSum = 0; + count = 0; + + cov = 0.; + leftSd = 0.; + rightSd = 0.; + + //compute mean for both sides, save count + for (Tuple2 pair : values) { + leftSum += pair.f0; + rightSum += pair.f1; + count++; + } + + leftMean = leftSum.doubleValue() / count; + rightMean = rightSum.doubleValue() / count; + + //compute covariance & std. deviations + for (Tuple2 pair : values) { + cov += (pair.f0 - leftMean) * (pair.f1 - rightMean) / count; + } + + for (Tuple2 pair : values) { + leftSd += Math.pow(pair.f0 - leftMean, 2) / count; + rightSd += Math.pow(pair.f1 - rightMean, 2) / count; + } + leftSd = Math.sqrt(leftSd); + rightSd = Math.sqrt(rightSd); + + out.collect(cov / (leftSd * rightSd)); + } +} + +{{< / highlight >}} + +
+ +
+ +[Back to top](#top) + + +Other things to try +--------------- + +For a full feature overview please check the [Streaming Guide]({{< param DocsBaseUrl >}}flink-docs-master/apis/streaming/index.html), which describes all the available API features. +You are very welcome to try out our features for different use-cases we are looking forward to your experiences. Feel free to [contact us](http://flink.apache.org/community.html#mailing-lists). + +Upcoming for streaming +--------------- + +There are some aspects of Flink Streaming that are subjects to +change by the next release making this application look even nicer. + +Stay tuned for later blog posts on how Flink Streaming works +internally, fault tolerance, and performance measurements! + +[Back to top](#top) diff --git a/docs/content.tr/posts/2015-03-02-february-2015-in-flink.md b/docs/content.tr/posts/2015-03-02-february-2015-in-flink.md new file mode 100644 index 0000000000..e0c5bcd4f1 --- /dev/null +++ b/docs/content.tr/posts/2015-03-02-february-2015-in-flink.md @@ -0,0 +1,112 @@ +--- +date: "2015-03-02T10:00:00Z" +title: February 2015 in the Flink community +aliases: +- /news/2015/03/02/february-2015-in-flink.html +--- + +February might be the shortest month of the year, but this does not +mean that the Flink community has not been busy adding features to the +system and fixing bugs. Here’s a rundown of the activity in the Flink +community last month. + +### 0.8.1 release + +Flink 0.8.1 was released. This bugfixing release resolves a total of 22 issues. + +### New committer + +[Max Michels](https://github.com/mxm) has been voted a committer by the Flink PMC. + +### Flink adapter for Apache SAMOA + +[Apache SAMOA (incubating)](http://samoa.incubator.apache.org) is a +distributed streaming machine learning (ML) framework with a +programming abstraction for distributed streaming ML algorithms. SAMOA +runs on a variety of backend engines, currently Apache Storm and +Apache S4. A [pull +request](https://github.com/apache/incubator-samoa/pull/11) is +available at the SAMOA repository that adds a Flink adapter for SAMOA. + +### Easy Flink deployment on Google Compute Cloud + +Flink is now integrated in bdutil, Google’s open source tool for +creating and configuring (Hadoop) clusters in Google Compute +Engine. Deployment of Flink clusters in now supported starting with +[bdutil +1.2.0](https://groups.google.com/forum/#!topic/gcp-hadoop-announce/uVJ_6y9cGKM). + +### Flink on the Web + +A new blog post on [Flink +Streaming](http://flink.apache.org/news/2015/02/09/streaming-example.html) +was published at the blog. Flink was mentioned in several articles on +the web. Here are some examples: + +- [How Flink became an Apache Top-Level Project](http://dataconomy.com/how-flink-became-an-apache-top-level-project/) + +- [Stale Synchronous Parallelism: The new frontier for Apache Flink?](https://www.linkedin.com/pulse/stale-synchronous-parallelism-new-frontier-apache-flink-nam-luc-tran?utm_content=buffer461af&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer) + +- [Distributed data processing with Apache Flink](http://www.hadoopsphere.com/2015/02/distributed-data-processing-with-apache.html) + +- [Ciao latency, hello speed](http://www.hadoopsphere.com/2015/02/ciao-latency-hallo-speed.html) + +## In the Flink master + +The following features have been now merged in Flink’s master repository. + +### Gelly + +Gelly, Flink’s Graph API allows users to manipulate graph-shaped data +directly. Here’s for example a calculation of shortest paths in a +graph: + +{{< highlight java >}} +Graph graph = Graph.fromDataSet(vertices, edges, env); + +DataSet> singleSourceShortestPaths = graph + .run(new SingleSourceShortestPaths(srcVertexId, + maxIterations)).getVertices(); +{{< / highlight >}} + +See more Gelly examples +[here](https://github.com/apache/flink/tree/master/flink-libraries/flink-gelly-examples). + +### Flink Expressions + +The newly merged +[flink-table](https://github.com/apache/flink/tree/master/flink-libraries/flink-table) +module is the first step in Flink’s roadmap towards logical queries +and SQL support. Here’s a preview on how you can read two CSV file, +assign a logical schema to, and apply transformations like filters and +joins using logical attributes rather than physical data types. + +{{< highlight scala >}} +val customers = getCustomerDataSet(env) + .as('id, 'mktSegment) + .filter( 'mktSegment === "AUTOMOBILE" ) + +val orders = getOrdersDataSet(env) + .filter( o => dateFormat.parse(o.orderDate).before(date) ) + .as('orderId, 'custId, 'orderDate, 'shipPrio) + +val items = + orders.join(customers) + .where('custId === 'id) + .select('orderId, 'orderDate, 'shipPrio) +{{< / highlight >}} + +### Access to HCatalog tables + +With the [flink-hcatalog +module](https://github.com/apache/flink/tree/master/flink-batch-connectors/flink-hcatalog), +you can now conveniently access HCatalog/Hive tables. The module +supports projection (selection and order of fields) and partition +filters. + +### Access to secured YARN clusters/HDFS. + +With this change users can access Kerberos secured YARN (and HDFS) +Hadoop clusters. Also, basic support for accessing secured HDFS with +a standalone Flink setup is now available. + diff --git a/docs/content.tr/posts/2015-03-13-peeking-into-Apache-Flinks-Engine-Room.md b/docs/content.tr/posts/2015-03-13-peeking-into-Apache-Flinks-Engine-Room.md new file mode 100644 index 0000000000..7c5d83baf9 --- /dev/null +++ b/docs/content.tr/posts/2015-03-13-peeking-into-Apache-Flinks-Engine-Room.md @@ -0,0 +1,182 @@ +--- +author: Fabian Hüske +author-twitter: fhueske +date: "2015-03-13T10:00:00Z" +excerpt: Joins are prevalent operations in many data processing applications. Most + data processing systems feature APIs that make joining data sets very easy. However, + the internal algorithms for join processing are much more involved – especially + if large data sets need to be efficiently handled. In this blog post, we cut through + Apache Flink’s layered architecture and take a look at its internals with a focus + on how it handles joins. +title: Peeking into Apache Flink's Engine Room +aliases: +- /news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html +--- + +### Join Processing in Apache Flink + +Joins are prevalent operations in many data processing applications. Most data processing systems feature APIs that make joining data sets very easy. However, the internal algorithms for join processing are much more involved – especially if large data sets need to be efficiently handled. Therefore, join processing serves as a good example to discuss the salient design points and implementation details of a data processing system. + +In this blog post, we cut through Apache Flink’s layered architecture and take a look at its internals with a focus on how it handles joins. Specifically, I will + +* show how easy it is to join data sets using Flink’s fluent APIs, +* discuss basic distributed join strategies, Flink’s join implementations, and its memory management, +* talk about Flink’s optimizer that automatically chooses join strategies, +* show some performance numbers for joining data sets of different sizes, and finally +* briefly discuss joining of co-located and pre-sorted data sets. + +*Disclaimer*: This blog post is exclusively about equi-joins. Whenever I say “join” in the following, I actually mean “equi-join”. + +### How do I join with Flink? + +Flink provides fluent APIs in Java and Scala to write data flow programs. Flink’s APIs are centered around parallel data collections which are called data sets. data sets are processed by applying Transformations that compute new data sets. Flink’s transformations include Map and Reduce as known from MapReduce [[1]](http://research.google.com/archive/mapreduce.html) but also operators for joining, co-grouping, and iterative processing. The documentation gives an overview of all available transformations [[2]]({{< param DocsBaseUrl >}}flink-docs-release-0.8/dataset_transformations.html). + +Joining two Scala case class data sets is very easy as the following example shows: + +```scala +// define your data types +case class PageVisit(url: String, ip: String, userId: Long) +case class User(id: Long, name: String, email: String, country: String) + +// get your data from somewhere +val visits: DataSet[PageVisit] = ... +val users: DataSet[User] = ... + +// filter the users data set +val germanUsers = users.filter((u) => u.country.equals("de")) +// join data sets +val germanVisits: DataSet[(PageVisit, User)] = + // equi-join condition (PageVisit.userId = User.id) + visits.join(germanUsers).where("userId").equalTo("id") + +``` + +Flink’s APIs also allow to: + +* apply a user-defined join function to each pair of joined elements instead returning a `($Left, $Right)` tuple, +* select fields of pairs of joined Tuple elements (projection), and +* define composite join keys such as `.where(“orderDate”, “zipCode”).equalTo(“date”, “zip”)`. + +See the documentation for more details on Flink’s join features [[3]]({{< param DocsBaseUrl >}}flink-docs-release-0.8/dataset_transformations.html#join). + + +### How does Flink join my data? + +Flink uses techniques which are well known from parallel database systems to efficiently execute parallel joins. A join operator must establish all pairs of elements from its input data sets for which the join condition evaluates to true. In a standalone system, the most straight-forward implementation of a join is the so-called nested-loop join which builds the full Cartesian product and evaluates the join condition for each pair of elements. This strategy has quadratic complexity and does obviously not scale to large inputs. + +In a distributed system joins are commonly processed in two steps: + +1. The data of both inputs is distributed across all parallel instances that participate in the join and +1. each parallel instance performs a standard stand-alone join algorithm on its local partition of the overall data. + +The distribution of data across parallel instances must ensure that each valid join pair can be locally built by exactly one instance. For both steps, there are multiple valid strategies that can be independently picked and which are favorable in different situations. In Flink terminology, the first phase is called Ship Strategy and the second phase Local Strategy. In the following I will describe Flink’s ship and local strategies to join two data sets *R* and *S*. + +#### Ship Strategies +Flink features two ship strategies to establish a valid data partitioning for a join: + +* the *Repartition-Repartition* strategy (RR) and +* the *Broadcast-Forward* strategy (BF). + +The Repartition-Repartition strategy partitions both inputs, R and S, on their join key attributes using the same partitioning function. Each partition is assigned to exactly one parallel join instance and all data of that partition is sent to its associated instance. This ensures that all elements that share the same join key are shipped to the same parallel instance and can be locally joined. The cost of the RR strategy is a full shuffle of both data sets over the network. + +
+ +
+ +The Broadcast-Forward strategy sends one complete data set (R) to each parallel instance that holds a partition of the other data set (S), i.e., each parallel instance receives the full data set R. Data set S remains local and is not shipped at all. The cost of the BF strategy depends on the size of R and the number of parallel instances it is shipped to. The size of S does not matter because S is not moved. The figure below illustrates how both ship strategies work. + +
+ +
+ +The Repartition-Repartition and Broadcast-Forward ship strategies establish suitable data distributions to execute a distributed join. Depending on the operations that are applied before the join, one or even both inputs of a join are already distributed in a suitable way across parallel instances. In this case, Flink will reuse such distributions and only ship one or no input at all. + +#### Flink’s Memory Management +Before delving into the details of Flink’s local join algorithms, I will briefly discuss Flink’s internal memory management. Data processing algorithms such as joining, grouping, and sorting need to hold portions of their input data in memory. While such algorithms perform best if there is enough memory available to hold all data, it is crucial to gracefully handle situations where the data size exceeds memory. Such situations are especially tricky in JVM-based systems such as Flink because the system needs to reliably recognize that it is short on memory. Failure to detect such situations can result in an `OutOfMemoryException` and kill the JVM. + +Flink handles this challenge by actively managing its memory. When a worker node (TaskManager) is started, it allocates a fixed portion (70% by default) of the JVM’s heap memory that is available after initialization as 32KB byte arrays. These byte arrays are distributed as working memory to all algorithms that need to hold significant portions of data in memory. The algorithms receive their input data as Java data objects and serialize them into their working memory. + +This design has several nice properties. First, the number of data objects on the JVM heap is much lower resulting in less garbage collection pressure. Second, objects on the heap have a certain space overhead and the binary representation is more compact. Especially data sets of many small elements benefit from that. Third, an algorithm knows exactly when the input data exceeds its working memory and can react by writing some of its filled byte arrays to the worker’s local filesystem. After the content of a byte array is written to disk, it can be reused to process more data. Reading data back into memory is as simple as reading the binary data from the local filesystem. The following figure illustrates Flink’s memory management. + +
+ +
+ +This active memory management makes Flink extremely robust for processing very large data sets on limited memory resources while preserving all benefits of in-memory processing if data is small enough to fit in-memory. De/serializing data into and from memory has a certain cost overhead compared to simply holding all data elements on the JVM’s heap. However, Flink features efficient custom de/serializers which also allow to perform certain operations such as comparisons directly on serialized data without deserializing data objects from memory. + +#### Local Strategies + +After the data has been distributed across all parallel join instances using either a Repartition-Repartition or Broadcast-Forward ship strategy, each instance runs a local join algorithm to join the elements of its local partition. Flink’s runtime features two common join strategies to perform these local joins: + +* the *Sort-Merge-Join* strategy (SM) and +* the *Hybrid-Hash-Join* strategy (HH). + +The Sort-Merge-Join works by first sorting both input data sets on their join key attributes (Sort Phase) and merging the sorted data sets as a second step (Merge Phase). The sort is done in-memory if the local partition of a data set is small enough. Otherwise, an external merge-sort is done by collecting data until the working memory is filled, sorting it, writing the sorted data to the local filesystem, and starting over by filling the working memory again with more incoming data. After all input data has been received, sorted, and written as sorted runs to the local file system, a fully sorted stream can be obtained. This is done by reading the partially sorted runs from the local filesystem and sort-merging the records on the fly. Once the sorted streams of both inputs are available, both streams are sequentially read and merge-joined in a zig-zag fashion by comparing the sorted join key attributes, building join element pairs for matching keys, and advancing the sorted stream with the lower join key. The figure below shows how the Sort-Merge-Join strategy works. + +
+ +
+ +The Hybrid-Hash-Join distinguishes its inputs as build-side and probe-side input and works in two phases, a build phase followed by a probe phase. In the build phase, the algorithm reads the build-side input and inserts all data elements into an in-memory hash table indexed by their join key attributes. If the hash table outgrows the algorithm's working memory, parts of the hash table (ranges of hash indexes) are written to the local filesystem. The build phase ends after the build-side input has been fully consumed. In the probe phase, the algorithm reads the probe-side input and probes the hash table for each element using its join key attribute. If the element falls into a hash index range that was spilled to disk, the element is also written to disk. Otherwise, the element is immediately joined with all matching elements from the hash table. If the hash table completely fits into the working memory, the join is finished after the probe-side input has been fully consumed. Otherwise, the current hash table is dropped and a new hash table is built using spilled parts of the build-side input. This hash table is probed by the corresponding parts of the spilled probe-side input. Eventually, all data is joined. Hybrid-Hash-Joins perform best if the hash table completely fits into the working memory because an arbitrarily large the probe-side input can be processed on-the-fly without materializing it. However even if build-side input does not fit into memory, the the Hybrid-Hash-Join has very nice properties. In this case, in-memory processing is partially preserved and only a fraction of the build-side and probe-side data needs to be written to and read from the local filesystem. The next figure illustrates how the Hybrid-Hash-Join works. + +
+ +
+ +### How does Flink choose join strategies? + +Ship and local strategies do not depend on each other and can be independently chosen. Therefore, Flink can execute a join of two data sets R and S in nine different ways by combining any of the three ship strategies (RR, BF with R being broadcasted, BF with S being broadcasted) with any of the three local strategies (SM, HH with R being build-side, HH with S being build-side). Each of these strategy combinations results in different execution performance depending on the data sizes and the available amount of working memory. In case of a small data set R and a much larger data set S, broadcasting R and using it as build-side input of a Hybrid-Hash-Join is usually a good choice because the much larger data set S is not shipped and not materialized (given that the hash table completely fits into memory). If both data sets are rather large or the join is performed on many parallel instances, repartitioning both inputs is a robust choice. + +Flink features a cost-based optimizer which automatically chooses the execution strategies for all operators including joins. Without going into the details of cost-based optimization, this is done by computing cost estimates for execution plans with different strategies and picking the plan with the least estimated costs. Thereby, the optimizer estimates the amount of data which is shipped over the the network and written to disk. If no reliable size estimates for the input data can be obtained, the optimizer falls back to robust default choices. A key feature of the optimizer is to reason about existing data properties. For example, if the data of one input is already partitioned in a suitable way, the generated candidate plans will not repartition this input. Hence, the choice of a RR ship strategy becomes more likely. The same applies for previously sorted data and the Sort-Merge-Join strategy. Flink programs can help the optimizer to reason about existing data properties by providing semantic information about user-defined functions [[4]]({{< param DocsBaseUrl >}}flink-docs-release-1.0/apis/batch/index.html#semantic-annotations). While the optimizer is a killer feature of Flink, it can happen that a user knows better than the optimizer how to execute a specific join. Similar to relational database systems, Flink offers optimizer hints to tell the optimizer which join strategies to pick [[5]]({{< param DocsBaseUrl >}}flink-docs-release-1.0/apis/batch/dataset_transformations.html#join-algorithm-hints). + +### How is Flink’s join performance? + +Alright, that sounds good, but how fast are joins in Flink? Let’s have a look. We start with a benchmark of the single-core performance of Flink’s Hybrid-Hash-Join implementation and run a Flink program that executes a Hybrid-Hash-Join with parallelism 1. We run the program on a n1-standard-2 Google Compute Engine instance (2 vCPUs, 7.5GB memory) with two locally attached SSDs. We give 4GB as working memory to the join. The join program generates 1KB records for both inputs on-the-fly, i.e., the data is not read from disk. We run 1:N (Primary-Key/Foreign-Key) joins and generate the smaller input with unique Integer join keys and the larger input with randomly chosen Integer join keys that fall into the key range of the smaller input. Hence, each tuple of the larger side joins with exactly one tuple of the smaller side. The result of the join is immediately discarded. We vary the size of the build-side input from 1 million to 12 million elements (1GB to 12GB). The probe-side input is kept constant at 64 million elements (64GB). The following chart shows the average execution time of three runs for each setup. + +
+ +
+ +The joins with 1 to 3 GB build side (blue bars) are pure in-memory joins. The other joins partially spill data to disk (4 to 12GB, orange bars). The results show that the performance of Flink’s Hybrid-Hash-Join remains stable as long as the hash table completely fits into memory. As soon as the hash table becomes larger than the working memory, parts of the hash table and corresponding parts of the probe side are spilled to disk. The chart shows that the performance of the Hybrid-Hash-Join gracefully decreases in this situation, i.e., there is no sharp increase in runtime when the join starts spilling. In combination with Flink’s robust memory management, this execution behavior gives smooth performance without the need for fine-grained, data-dependent memory tuning. + +So, Flink’s Hybrid-Hash-Join implementation performs well on a single thread even for limited memory resources, but how good is Flink’s performance when joining larger data sets in a distributed setting? For the next experiment we compare the performance of the most common join strategy combinations, namely: + +* Broadcast-Forward, Hybrid-Hash-Join (broadcasting and building with the smaller side), +* Repartition, Hybrid-Hash-Join (building with the smaller side), and +* Repartition, Sort-Merge-Join + +for different input size ratios: + +* 1GB : 1000GB +* 10GB : 1000GB +* 100GB : 1000GB +* 1000GB : 1000GB + +The Broadcast-Forward strategy is only executed for up to 10GB. Building a hash table from 100GB broadcasted data in 5GB working memory would result in spilling proximately 95GB (build input) + 950GB (probe input) in each parallel thread and require more than 8TB local disk storage on each machine. + +As in the single-core benchmark, we run 1:N joins, generate the data on-the-fly, and immediately discard the result after the join. We run the benchmark on 10 n1-highmem-8 Google Compute Engine instances. Each instance is equipped with 8 cores, 52GB RAM, 40GB of which are configured as working memory (5GB per core), and one local SSD for spilling to disk. All benchmarks are performed using the same configuration, i.e., no fine tuning for the respective data sizes is done. The programs are executed with a parallelism of 80. + +
+ +
+ +As expected, the Broadcast-Forward strategy performs best for very small inputs because the large probe side is not shipped over the network and is locally joined. However, when the size of the broadcasted side grows, two problems arise. First the amount of data which is shipped increases but also each parallel instance has to process the full broadcasted data set. The performance of both Repartitioning strategies behaves similar for growing input sizes which indicates that these strategies are mainly limited by the cost of the data transfer (at max 2TB are shipped over the network and joined). Although the Sort-Merge-Join strategy shows the worst performance all shown cases, it has a right to exist because it can nicely exploit sorted input data. + +### I’ve got sooo much data to join, do I really need to ship it? + +We have seen that off-the-shelf distributed joins work really well in Flink. But what if your data is so huge that you do not want to shuffle it across your cluster? We recently added some features to Flink for specifying semantic properties (partitioning and sorting) on input splits and co-located reading of local input files. With these tools at hand, it is possible to join pre-partitioned data sets from your local filesystem without sending a single byte over your cluster’s network. If the input data is even pre-sorted, the join can be done as a Sort-Merge-Join without sorting, i.e., the join is essentially done on-the-fly. Exploiting co-location requires a very special setup though. Data needs to be stored on the local filesystem because HDFS does not feature data co-location and might move file blocks across data nodes. That means you need to take care of many things yourself which HDFS would have done for you, including replication to avoid data loss. On the other hand, performance gains of joining co-located and pre-sorted can be quite substantial. + +### tl;dr: What should I remember from all of this? + +* Flink’s fluent Scala and Java APIs make joins and other data transformations easy as cake. +* The optimizer does the hard choices for you, but gives you control in case you know better. +* Flink’s join implementations perform very good in-memory and gracefully degrade when going to disk. +* Due to Flink’s robust memory management, there is no need for job- or data-specific memory tuning to avoid a nasty `OutOfMemoryException`. It just runs out-of-the-box. + +#### References + +[1] [“MapReduce: Simplified data processing on large clusters”](), Dean, Ghemawat, 2004
+[2] [Flink 0.8.1 documentation: Data Transformations]({{< param DocsBaseUrl >}}flink-docs-release-0.8/dataset_transformations.html)
+[3] [Flink 0.8.1 documentation: Joins]({{< param DocsBaseUrl >}}flink-docs-release-0.8/dataset_transformations.html#join)
+[4] [Flink 1.0 documentation: Semantic annotations]({{< param DocsBaseUrl >}}flink-docs-release-1.0/apis/batch/index.html#semantic-annotations)
+[5] [Flink 1.0 documentation: Optimizer join hints]({{< param DocsBaseUrl >}}flink-docs-release-1.0/apis/batch/dataset_transformations.html#join-algorithm-hints)
diff --git a/docs/content.tr/posts/2015-04-07-march-in-flink.md b/docs/content.tr/posts/2015-04-07-march-in-flink.md new file mode 100644 index 0000000000..c7a82857bc --- /dev/null +++ b/docs/content.tr/posts/2015-04-07-march-in-flink.md @@ -0,0 +1,66 @@ +--- +date: "2015-04-07T10:00:00Z" +title: March 2015 in the Flink community +aliases: +- /news/2015/04/07/march-in-flink.html +--- + +March has been a busy month in the Flink community. + +### Scaling ALS + +Flink committers employed at [data Artisans](http://data-artisans.com) published a [blog post](http://data-artisans.com/how-to-factorize-a-700-gb-matrix-with-apache-flink/) on how they scaled matrix factorization with Flink and Google Compute Engine to matrices with 28 billion elements. + +### Learn about the internals of Flink + +The community has started an effort to better document the internals +of Flink. Check out the first articles on the Flink wiki on [how Flink +manages +memory](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741525), +[how tasks in Flink exchange +data](https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks), +[type extraction and serialization in +Flink](https://cwiki.apache.org/confluence/display/FLINK/Type+System%2C+Type+Extraction%2C+Serialization), +as well as [how Flink builds on Akka for distributed +coordination](https://cwiki.apache.org/confluence/display/FLINK/Akka+and+Actors). + +Check out also the [new blog +post](http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html) +on how Flink executes joins with several insights into Flink's runtime. + +### Meetups and talks + +Flink's machine learning efforts were presented at the [Machine +Learning Stockholm meetup +group](http://www.meetup.com/Machine-Learning-Stockholm/events/221144997/). The +regular Berlin Flink meetup featured a talk on the past, present, and +future of Flink. The talk is available on +[youtube](https://www.youtube.com/watch?v=fw2DBE6ZiEQ&feature=youtu.be). + +## In the Flink master + +### Table API in Scala and Java + +The new [Table +API](https://github.com/apache/flink/tree/master/flink-libraries/flink-table) +in Flink is now available in both Java and Scala. Check out the +examples [here (Java)](https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/java/org/apache/flink/examples/java/JavaTableExample.java) and [here (Scala)](https://github.com/apache/flink/tree/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/examples/scala). + +### Additions to the Machine Learning library + +Flink's [Machine Learning +library](https://github.com/apache/flink/tree/master/flink-libraries/flink-ml) +is seeing quite a bit of traction. Recent additions include the [CoCoA +algorithm](http://arxiv.org/abs/1409.1458) for distributed +optimization. + +### Exactly-once delivery guarantees for streaming jobs + +Flink streaming jobs now provide exactly once processing guarantees +when coupled with persistent sources (notably [Apache +Kafka](http://kafka.apache.org)). Flink periodically checkpoints and +persists the offsets of the sources and restarts from those +checkpoints at failure recovery. This functionality is currently +limited in that it does not yet handle large state and iterative +programs. + diff --git a/docs/content.tr/posts/2015-04-13-release-0.9.0-milestone1.md b/docs/content.tr/posts/2015-04-13-release-0.9.0-milestone1.md new file mode 100644 index 0000000000..cff0d2d963 --- /dev/null +++ b/docs/content.tr/posts/2015-04-13-release-0.9.0-milestone1.md @@ -0,0 +1,230 @@ +--- +date: "2015-04-13T10:00:00Z" +title: Announcing Flink 0.9.0-milestone1 preview release +aliases: +- /news/2015/04/13/release-0.9.0-milestone1.html +--- + +The Apache Flink community is pleased to announce the availability of +the 0.9.0-milestone-1 release. The release is a preview of the +upcoming 0.9.0 release. It contains many new features which will be +available in the upcoming 0.9 release. Interested users are encouraged +to try it out and give feedback. As the version number indicates, this +release is a preview release that contains known issues. + +You can download the release +[here](http://flink.apache.org/downloads.html#preview) and check out the +latest documentation +[here]({{< param DocsBaseUrl >}}flink-docs-master/). Feedback +through the Flink [mailing +lists](http://flink.apache.org/community.html#mailing-lists) is, as +always, very welcome! + +## New Features + +### Table API + +Flink’s new Table API offers a higher-level abstraction for +interacting with structured data sources. The Table API allows users +to execute logical, SQL-like queries on distributed data sets while +allowing them to freely mix declarative queries with regular Flink +operators. Here is an example that groups and joins two tables: + +```scala +val clickCounts = clicks + .groupBy('user).select('userId, 'url.count as 'count) + +val activeUsers = users.join(clickCounts) + .where('id === 'userId && 'count > 10).select('username, 'count, ...) +``` + +Tables consist of logical attributes that can be selected by name +rather than physical Java and Scala data types. This alleviates a lot +of boilerplate code for common ETL tasks and raises the abstraction +for Flink programs. Tables are available for both static and streaming +data sources (DataSet and DataStream APIs). + +Check out the Table guide for Java and Scala +[here]({{< param DocsBaseUrl >}}flink-docs-master/apis/batch/libs/table.html). + +### Gelly Graph Processing API + +Gelly is a Java Graph API for Flink. It contains a set of utilities +for graph analysis, support for iterative graph processing and a +library of graph algorithms. Gelly exposes a Graph data structure that +wraps DataSets for vertices and edges, as well as methods for creating +graphs from DataSets, graph transformations and utilities (e.g., in- +and out- degrees of vertices), neighborhood aggregations, iterative +vertex-centric graph processing, as well as a library of common graph +algorithms, including PageRank, SSSP, label propagation, and community +detection. + +Gelly internally builds on top of Flink’s [delta +iterations]({{< param DocsBaseUrl >}}flink-docs-master/apis/batch/iterations.html). Iterative +graph algorithms are executed leveraging mutable state, achieving +similar performance with specialized graph processing systems. + +Gelly will eventually subsume Spargel, Flink’s Pregel-like API. Check +out the Gelly guide +[here]({{< param DocsBaseUrl >}}flink-docs-master/apis/batch/libs/gelly.html). + +### Flink Machine Learning Library + +This release includes the first version of Flink’s Machine Learning +library. The library’s pipeline approach, which has been strongly +inspired by scikit-learn’s abstraction of transformers and estimators, +makes it easy to quickly set up a data processing pipeline and to get +your job done. + +Flink distinguishes between transformers and learners. Transformers +are components which transform your input data into a new format +allowing you to extract features, cleanse your data or to sample from +it. Learners on the other hand constitute the components which take +your input data and train a model on it. The model you obtain from the +learner can then be evaluated and used to make predictions on unseen +data. + +Currently, the machine learning library contains transformers and +learners to do multiple tasks. The library supports multiple linear +regression using a stochastic gradient implementation to scale to +large data sizes. Furthermore, it includes an alternating least +squares (ALS) implementation to factorizes large matrices. The matrix +factorization can be used to do collaborative filtering. An +implementation of the communication efficient distributed dual +coordinate ascent (CoCoA) algorithm is the latest addition to the +library. The CoCoA algorithm can be used to train distributed +soft-margin SVMs. + +### Flink on YARN leveraging Apache Tez + +We are introducing a new execution mode for Flink to be able to run +restricted Flink programs on top of [Apache +Tez](http://tez.apache.org). This mode retains Flink’s APIs, +optimizer, as well as Flink’s runtime operators, but instead of +wrapping those in Flink tasks that are executed by Flink TaskManagers, +it wraps them in Tez runtime tasks and builds a Tez DAG that +represents the program. + +By using Flink on Tez, users have an additional choice for an +execution platform for Flink programs. While Flink’s distributed +runtime favors low latency, streaming shuffles, and iterative +algorithms, Tez focuses on scalability and elastic resource usage in +shared YARN clusters. + +Get started with Flink on Tez +[here]({{< param DocsBaseUrl >}}flink-docs-master/setup/flink_on_tez.html). + +### Reworked Distributed Runtime on Akka + +Flink’s RPC system has been replaced by the widely adopted +[Akka](http://akka.io) framework. Akka’s concurrency model offers the +right abstraction to develop a fast as well as robust distributed +system. By using Akka’s own failure detection mechanism the stability +of Flink’s runtime is significantly improved, because the system can +now react in proper form to node outages. Furthermore, Akka improves +Flink’s scalability by introducing asynchronous messages to the +system. These asynchronous messages allow Flink to be run on many more +nodes than before. + +### Exactly-once processing on Kafka Streaming Sources + +This release introduces stream processing with exacly-once delivery +guarantees for Flink streaming programs that analyze streaming sources +that are persisted by [Apache Kafka](http://kafka.apache.org). The +system is internally tracking the Kafka offsets to ensure that Flink +can pick up data from Kafka where it left off in case of an failure. + +Read +[here]({{< param DocsBaseUrl >}}flink-docs-master/apis/streaming_guide.html#apache-kafka) +on how to use the persistent Kafka source. + +### Improved YARN support + +Flink’s YARN client contains several improvements, such as a detached +mode for starting a YARN session in the background, the ability to +submit a single Flink job to a YARN cluster without starting a +session, including a “fire and forget” mode. Flink is now also able to +reallocate failed YARN containers to maintain the size of the +requested cluster. This feature allows to implement fault-tolerant +setups on top of YARN. There is also an internal Java API to deploy +and control a running YARN cluster. This is being used by system +integrators to easily control Flink on YARN within their Hadoop 2 +cluster. + +See the YARN docs +[here]({{< param DocsBaseUrl >}}flink-docs-master/setup/yarn_setup.html). + +## More Improvements and Fixes + +* [FLINK-1605](https://issues.apache.org/jira/browse/FLINK-1605): + Flink is not exposing its Guava and ASM dependencies to Maven + projects depending on Flink. We use the maven-shade-plugin to + relocate these dependencies into our own namespace. This allows + users to use any Guava or ASM version. + +* [FLINK-1417](https://issues.apache.org/jira/browse/FLINK-1605): +Automatic recognition and registration of Java Types at Kryo and the +internal serializers: Flink has its own type handling and +serialization framework falling back to Kryo for types that it cannot +handle. To get the best performance Flink is automatically registering +all types a user is using in their program with Kryo.Flink also +registers serializers for Protocol Buffers, Thrift, Avro and YodaTime +automatically. Users can also manually register serializers to Kryo +(https://issues.apache.org/jira/browse/FLINK-1399) + +* [FLINK-1296](https://issues.apache.org/jira/browse/FLINK-1296): Add + support for sorting very large records + +* [FLINK-1679](https://issues.apache.org/jira/browse/FLINK-1679): + "degreeOfParallelism" methods renamed to "parallelism" + +* [FLINK-1501](https://issues.apache.org/jira/browse/FLINK-1501): Add + metrics library for monitoring TaskManagers + +* [FLINK-1760](https://issues.apache.org/jira/browse/FLINK-1760): Add + support for building Flink with Scala 2.11 + +* [FLINK-1648](https://issues.apache.org/jira/browse/FLINK-1648): Add + a mode where the system automatically sets the parallelism to the + available task slots + +* [FLINK-1622](https://issues.apache.org/jira/browse/FLINK-1622): Add + groupCombine operator + +* [FLINK-1589](https://issues.apache.org/jira/browse/FLINK-1589): Add + option to pass Configuration to LocalExecutor + +* [FLINK-1504](https://issues.apache.org/jira/browse/FLINK-1504): Add + support for accessing secured HDFS clusters in standalone mode + +* [FLINK-1478](https://issues.apache.org/jira/browse/FLINK-1478): Add + strictly local input split assignment + +* [FLINK-1512](https://issues.apache.org/jira/browse/FLINK-1512): Add + CsvReader for reading into POJOs. + +* [FLINK-1461](https://issues.apache.org/jira/browse/FLINK-1461): Add + sortPartition operator + +* [FLINK-1450](https://issues.apache.org/jira/browse/FLINK-1450): Add + Fold operator to the Streaming api + +* [FLINK-1389](https://issues.apache.org/jira/browse/FLINK-1389): + Allow setting custom file extensions for files created by the + FileOutputFormat + +* [FLINK-1236](https://issues.apache.org/jira/browse/FLINK-1236): Add + support for localization of Hadoop Input Splits + +* [FLINK-1179](https://issues.apache.org/jira/browse/FLINK-1179): Add + button to JobManager web interface to request stack trace of a + TaskManager + +* [FLINK-1105](https://issues.apache.org/jira/browse/FLINK-1105): Add + support for locally sorted output + +* [FLINK-1688](https://issues.apache.org/jira/browse/FLINK-1688): Add + socket sink + +* [FLINK-1436](https://issues.apache.org/jira/browse/FLINK-1436): + Improve usability of command line interface diff --git a/docs/content.tr/posts/2015-05-11-Juggling-with-Bits-and-Bytes.md b/docs/content.tr/posts/2015-05-11-Juggling-with-Bits-and-Bytes.md new file mode 100644 index 0000000000..b4c2ae47f1 --- /dev/null +++ b/docs/content.tr/posts/2015-05-11-Juggling-with-Bits-and-Bytes.md @@ -0,0 +1,190 @@ +--- +author: Fabian Hüske +author-twitter: fhueske +date: "2015-05-11T10:00:00Z" +excerpt: |- +

Nowadays, a lot of open-source systems for analyzing large data sets are implemented in Java or other JVM-based programming languages. The most well-known example is Apache Hadoop, but also newer frameworks such as Apache Spark, Apache Drill, and also Apache Flink run on JVMs. A common challenge that JVM-based data analysis engines face is to store large amounts of data in memory - both for caching and for efficient processing such as sorting and joining of data. Managing the JVM memory well makes the difference between a system that is hard to configure and has unpredictable reliability and performance and a system that behaves robustly with few configuration knobs.

+

In this blog post we discuss how Apache Flink manages memory, talk about its custom data de/serialization stack, and show how it operates on binary data.

+title: Juggling with Bits and Bytes +aliases: +- /news/2015/05/11/Juggling-with-Bits-and-Bytes.html +--- + +## How Apache Flink operates on binary data + +Nowadays, a lot of open-source systems for analyzing large data sets are implemented in Java or other JVM-based programming languages. The most well-known example is Apache Hadoop, but also newer frameworks such as Apache Spark, Apache Drill, and also Apache Flink run on JVMs. A common challenge that JVM-based data analysis engines face is to store large amounts of data in memory - both for caching and for efficient processing such as sorting and joining of data. Managing the JVM memory well makes the difference between a system that is hard to configure and has unpredictable reliability and performance and a system that behaves robustly with few configuration knobs. + +In this blog post we discuss how Apache Flink manages memory, talk about its custom data de/serialization stack, and show how it operates on binary data. + +## Data Objects? Let’s put them on the heap! + +The most straight-forward approach to process lots of data in a JVM is to put it as objects on the heap and operate on these objects. Caching a data set as objects would be as simple as maintaining a list containing an object for each record. An in-memory sort would simply sort the list of objects. +However, this approach has a few notable drawbacks. First of all it is not trivial to watch and control heap memory usage when a lot of objects are created and invalidated constantly. Memory overallocation instantly kills the JVM with an `OutOfMemoryError`. Another aspect is garbage collection on multi-GB JVMs which are flooded with new objects. The overhead of garbage collection in such environments can easily reach 50% and more. Finally, Java objects come with a certain space overhead depending on the JVM and platform. For data sets with many small objects this can significantly reduce the effectively usable amount of memory. Given proficient system design and careful, use-case specific system parameter tuning, heap memory usage can be more or less controlled and `OutOfMemoryErrors` avoided. However, such setups are rather fragile especially if data characteristics or the execution environment change. + +## What is Flink doing about that? + +Apache Flink has its roots at a research project which aimed to combine the best technologies of MapReduce-based systems and parallel database systems. Coming from this background, Flink has always had its own way of processing data in-memory. Instead of putting lots of objects on the heap, Flink serializes objects into a fixed number of pre-allocated memory segments. Its DBMS-style sort and join algorithms operate as much as possible on this binary data to keep the de/serialization overhead at a minimum. If more data needs to be processed than can be kept in memory, Flink’s operators partially spill data to disk. In fact, a lot of Flink’s internal implementations look more like C/C++ rather than common Java. The following figure gives a high-level overview of how Flink stores data serialized in memory segments and spills to disk if necessary. + +
+ +
+ +Flink’s style of active memory management and operating on binary data has several benefits: + +1. **Memory-safe execution & efficient out-of-core algorithms.** Due to the fixed amount of allocated memory segments, it is trivial to monitor remaining memory resources. In case of memory shortage, processing operators can efficiently write larger batches of memory segments to disk and later them read back. Consequently, `OutOfMemoryErrors` are effectively prevented. +2. **Reduced garbage collection pressure.** Because all long-lived data is in binary representation in Flink's managed memory, all data objects are short-lived or even mutable and can be reused. Short-lived objects can be more efficiently garbage-collected, which significantly reduces garbage collection pressure. Right now, the pre-allocated memory segments are long-lived objects on the JVM heap, but the Flink community is actively working on allocating off-heap memory for this purpose. This effort will result in much smaller JVM heaps and facilitate even faster garbage collection cycles. +3. **Space efficient data representation.** Java objects have a storage overhead which can be avoided if the data is stored in a binary representation. +4. **Efficient binary operations & cache sensitivity.** Binary data can be efficiently compared and operated on given a suitable binary representation. Furthermore, the binary representations can put related values, as well as hash codes, keys, and pointers, adjacently into memory. This gives data structures with usually more cache efficient access patterns. + +These properties of active memory management are very desirable in a data processing systems for large-scale data analytics but have a significant price tag attached. Active memory management and operating on binary data is not trivial to implement, i.e., using `java.util.HashMap` is much easier than implementing a spillable hash-table backed by byte arrays and a custom serialization stack. Of course Apache Flink is not the only JVM-based data processing system that operates on serialized binary data. Projects such as [Apache Drill](http://drill.apache.org/), [Apache Ignite (incubating)](http://ignite.incubator.apache.org/) or [Apache Geode (incubating)](http://projectgeode.org/) apply similar techniques and it was recently announced that also [Apache Spark](http://spark.apache.org/) will evolve into this direction with [Project Tungsten](https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html). + +In the following we discuss in detail how Flink allocates memory, de/serializes objects, and operates on binary data. We will also show some performance numbers comparing processing objects on the heap and operating on binary data. + + +## How does Flink allocate memory? + +A Flink worker, called TaskManager, is composed of several internal components such as an actor system for coordination with the Flink master, an IOManager that takes care of spilling data to disk and reading it back, and a MemoryManager that coordinates memory usage. In the context of this blog post, the MemoryManager is of most interest. + +The MemoryManager takes care of allocating, accounting, and distributing MemorySegments to data processing operators such as sort and join operators. A [MemorySegment](https://github.com/apache/flink/blob/release-0.9.0-milestone-1/flink-core/src/main/java/org/apache/flink/core/memory/MemorySegment.java) is Flink’s distribution unit of memory and is backed by a regular Java byte array (size is 32 KB by default). A MemorySegment provides very efficient write and read access to its backed byte array using Java’s unsafe methods. You can think of a MemorySegment as a custom-tailored version of Java’s NIO ByteBuffer. In order to operate on multiple MemorySegments like on a larger chunk of consecutive memory, Flink uses logical views that implement Java’s `java.io.DataOutput` and `java.io.DataInput` interfaces. + +MemorySegments are allocated once at TaskManager start-up time and are destroyed when the TaskManager is shut down. Hence, they are reused and not garbage-collected over the whole lifetime of a TaskManager. After all internal data structures of a TaskManager have been initialized and all core services have been started, the MemoryManager starts creating MemorySegments. By default 70% of the JVM heap that is available after service initialization is allocated by the MemoryManager. It is also possible to configure an absolute amount of managed memory. The remaining JVM heap is used for objects that are instantiated during task processing, including objects created by user-defined functions. The following figure shows the memory distribution in the TaskManager JVM after startup. + +
+ +
+ +## How does Flink serialize objects? + +The Java ecosystem offers several libraries to convert objects into a binary representation and back. Common alternatives are standard Java serialization, [Kryo](https://github.com/EsotericSoftware/kryo), [Apache Avro](http://avro.apache.org/), [Apache Thrift](http://thrift.apache.org/), or Google’s [Protobuf](https://github.com/google/protobuf). Flink includes its own custom serialization framework in order to control the binary representation of data. This is important because operating on binary data such as comparing or even manipulating binary data requires exact knowledge of the serialization layout. Further, configuring the serialization layout with respect to operations that are performed on binary data can yield a significant performance boost. Flink’s serialization stack also leverages the fact, that the type of the objects which are going through de/serialization are exactly known before a program is executed. + +Flink programs can process data represented as arbitrary Java or Scala objects. Before a program is optimized, the data types at each processing step of the program’s data flow need to be identified. For Java programs, Flink features a reflection-based type extraction component to analyze the return types of user-defined functions. Scala programs are analyzed with help of the Scala compiler. Flink represents each data type with a [TypeInformation](https://github.com/apache/flink/blob/release-0.9.0-milestone-1/flink-core/src/main/java/org/apache/flink/api/common/typeinfo/TypeInformation.java). Flink has TypeInformations for several kinds of data types, including: + +* BasicTypeInfo: Any (boxed) Java primitive type or java.lang.String. +* BasicArrayTypeInfo: Any array of a (boxed) Java primitive type or java.lang.String. +* WritableTypeInfo: Any implementation of Hadoop’s Writable interface. +* TupleTypeInfo: Any Flink tuple (Tuple1 to Tuple25). Flink tuples are Java representations for fixed-length tuples with typed fields. +* CaseClassTypeInfo: Any Scala CaseClass (including Scala tuples). +* PojoTypeInfo: Any POJO (Java or Scala), i.e., an object with all fields either being public or accessible through getters and setter that follow the common naming conventions. +* GenericTypeInfo: Any data type that cannot be identified as another type. + +Each TypeInformation provides a serializer for the data type it represents. For example, a BasicTypeInfo returns a serializer that writes the respective primitive type, the serializer of a WritableTypeInfo delegates de/serialization to the write() and readFields() methods of the object implementing Hadoop’s Writable interface, and a GenericTypeInfo returns a serializer that delegates serialization to Kryo. Object serialization to a DataOutput which is backed by Flink MemorySegments goes automatically through Java’s efficient unsafe operations. For data types that can be used as keys, i.e., compared and hashed, the TypeInformation provides TypeComparators. TypeComparators compare and hash objects and can - depending on the concrete data type - also efficiently compare binary representations and extract fixed-length binary key prefixes. + +Tuple, Pojo, and CaseClass types are composite types, i.e., containers for one or more possibly nested data types. As such, their serializers and comparators are also composite and delegate the serialization and comparison of their member data types to the respective serializers and comparators. The following figure illustrates the serialization of a (nested) `Tuple3` object where `Person` is a POJO and defined as follows: + +```java +public class Person { + public int id; + public String name; +} +``` + +
+ +
+ +Flink’s type system can be easily extended by providing custom TypeInformations, Serializers, and Comparators to improve the performance of serializing and comparing custom data types. + +## How does Flink operate on binary data? + +Similar to many other data processing APIs (including SQL), Flink’s APIs provide transformations to group, sort, and join data sets. These transformations operate on potentially very large data sets. Relational database systems feature very efficient algorithms for these purposes since several decades including external merge-sort, merge-join, and hybrid hash-join. Flink builds on this technology, but generalizes it to handle arbitrary objects using its custom serialization and comparison stack. In the following, we show how Flink operates with binary data by the example of Flink’s in-memory sort algorithm. + +Flink assigns a memory budget to its data processing operators. Upon initialization, a sort algorithm requests its memory budget from the MemoryManager and receives a corresponding set of MemorySegments. The set of MemorySegments becomes the memory pool of a so-called sort buffer which collects the data that is be sorted. The following figure illustrates how data objects are serialized into the sort buffer. + +
+ +
+ +The sort buffer is internally organized into two memory regions. The first region holds the full binary data of all objects. The second region contains pointers to the full binary object data and - depending on the key data type - fixed-length sort keys. When an object is added to the sort buffer, its binary data is appended to the first region, and a pointer (and possibly a key) is appended to the second region. The separation of actual data and pointers plus fixed-length keys is done for two purposes. It enables efficient swapping of fix-length entries (key+pointer) and also reduces the data that needs to be moved when sorting. If the sort key is a variable length data type such as a String, the fixed-length sort key must be a prefix key such as the first n characters of a String. Note, not all data types provide a fixed-length (prefix) sort key. When serializing objects into the sort buffer, both memory regions are extended with MemorySegments from the memory pool. Once the memory pool is empty and no more objects can be added, the sort buffer is completely filled and can be sorted. Flink’s sort buffer provides methods to compare and swap elements. This makes the actual sort algorithm pluggable. By default, Flink uses a Quicksort implementation which can fall back to HeapSort. +The following figure shows how two objects are compared. + +
+ +
+ +The sort buffer compares two elements by comparing their binary fix-length sort keys. The comparison is successful if either done on a full key (not a prefix key) or if the binary prefix keys are not equal. If the prefix keys are equal (or the sort key data type does not provide a binary prefix key), the sort buffer follows the pointers to the actual object data, deserializes both objects and compares the objects. Depending on the result of the comparison, the sort algorithm decides whether to swap the compared elements or not. The sort buffer swaps two elements by moving their fix-length keys and pointers. The actual data is not moved. Once the sort algorithm finishes, the pointers in the sort buffer are correctly ordered. The following figure shows how the sorted data is returned from the sort buffer. + +
+ +
+ +The sorted data is returned by sequentially reading the pointer region of the sort buffer, skipping the sort keys and following the sorted pointers to the actual data. This data is either deserialized and returned as objects or the binary representation is copied and written to disk in case of an external merge-sort (see this [blog post on joins in Flink](http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html)). + +## Show me numbers! + +So, what does operating on binary data mean for performance? We’ll run a benchmark that sorts 10 million `Tuple2` objects to find out. The values of the Integer field are sampled from a uniform distribution. The String field values have a length of 12 characters and are sampled from a long-tail distribution. The input data is provided by an iterator that returns a mutable object, i.e., the same tuple object instance is returned with different field values. Flink uses this technique when reading data from memory, network, or disk to avoid unnecessary object instantiations. The benchmarks are run in a JVM with 900 MB heap size which is approximately the required amount of memory to store and sort 10 million tuple objects on the heap without dying of an `OutOfMemoryError`. We sort the tuples on the Integer field and on the String field using three sorting methods: + +1. **Object-on-heap.** The tuples are stored in a regular `java.util.ArrayList` with initial capacity set to 10 million entries and sorted using Java’s regular collection sort. +2. **Flink-serialized.** The tuple fields are serialized into a sort buffer of 600 MB size using Flink’s custom serializers, sorted as described above, and finally deserialized again. When sorting on the Integer field, the full Integer is used as sort key such that the sort happens entirely on binary data (no deserialization of objects required). For sorting on the String field a 8-byte prefix key is used and tuple objects are deserialized if the prefix keys are equal. +3. **Kryo-serialized.** The tuple fields are serialized into a sort buffer of 600 MB size using Kryo serialization and sorted without binary sort keys. This means that each pair-wise comparison requires two object to be deserialized. + +All sort methods are implemented using a single thread. The reported times are averaged over ten runs. After each run, we call `System.gc()` to request a garbage collection run which does not go into measured execution time. The following figure shows the time to store the input data in memory, sort it, and read it back as objects. + +
+ +
+ +We see that Flink’s sort on binary data using its own serializers significantly outperforms the other two methods. Comparing to the object-on-heap method, we see that loading the data into memory is much faster. Since we actually collect the objects, there is no opportunity to reuse the object instances, but have to re-create every tuple. This is less efficient than Flink’s serializers (or Kryo serialization). On the other hand, reading objects from the heap comes for free compared to deserialization. In our benchmark, object cloning was more expensive than serialization and deserialization combined. Looking at the sorting time, we see that also sorting on the binary representation is faster than Java’s collection sort. Sorting data that was serialized using Kryo without binary sort key, is much slower than both other methods. This is due to the heavy deserialization overhead. Sorting the tuples on their String field is faster than sorting on the Integer field due to the long-tailed value distribution which significantly reduces the number of pair-wise comparisons. To get a better feeling of what is happening during sorting we monitored the executing JVM using VisualVM. The following screenshots show heap memory usage, garbage collection activity and CPU usage over the execution of 10 runs. + + + + + + + + + + + + + + + + + + + + + + +
Garbage Collection
Memory Usage
Object-on-Heap (int)
Flink-Serialized (int)
Kryo-Serialized (int)
+ +The experiments run single-threaded on an 8-core machine, so full utilization of one core only corresponds to a 12.5% overall utilization. The screenshots show that operating on binary data significantly reduces garbage collection activity. For the object-on-heap approach, the garbage collector runs in very short intervals while filling the sort buffer and causes a lot of CPU usage even for a single processing thread (sorting itself does not trigger the garbage collector). The JVM garbage collects with multiple parallel threads, explaining the high overall CPU utilization. On the other hand, the methods that operate on serialized data rarely trigger the garbage collector and have a much lower CPU utilization. In fact the garbage collector does not run at all if the tuples are sorted on the Integer field using the flink-serialized method because no objects need to be deserialized for pair-wise comparisons. The kryo-serialized method requires slightly more garbage collection since it does not use binary sort keys and deserializes two objects for each comparison. + +The memory usage charts shows that the flink-serialized and kryo-serialized constantly occupy a high amount of memory (plus some objects for operation). This is due to the pre-allocation of MemorySegments. The actual memory usage is much lower, because the sort buffers are not completely filled. The following table shows the memory consumption of each method. 10 million records result in about 280 MB of binary data (object data plus pointers and sort keys) depending on the used serializer and presence and size of a binary sort key. Comparing this to the memory requirements of the object-on-heap approach we see that operating on binary data can significantly improve memory efficiency. In our benchmark more than twice as much data can be sorted in-memory if serialized into a sort buffer instead of holding it as objects on the heap. + + + + + + + + + + + + + + + + + + + + +
Occupied MemoryObject-on-HeapFlink-SerializedKryo-Serialized
Sort on Integerapprox. 700 MB (heap)277 MB (sort buffer)266 MB (sort buffer)
Sort on Stringapprox. 700 MB (heap)315 MB (sort buffer)266 MB (sort buffer)
+ +
+ +To summarize, the experiments verify the previously stated benefits of operating on binary data. + +## We’re not done yet! + +Apache Flink features quite a bit of advanced techniques to safely and efficiently process huge amounts of data with limited memory resources. However, there are a few points that could make Flink even more efficient. The Flink community is working on moving the managed memory to off-heap memory. This will allow for smaller JVMs, lower garbage collection overhead, and also easier system configuration. With Flink’s Table API, the semantics of all operations such as aggregations and projections are known (in contrast to black-box user-defined functions). Hence we can generate code for Table API operations that directly operates on binary data. Further improvements include serialization layouts which are tailored towards the operations that are applied on the binary data and code generation for serializers and comparators. + +The groundwork (and a lot more) for operating on binary data is done but there is still some room for making Flink even better and faster. If you are crazy about performance and like to juggle with lot of bits and bytes, join the Flink community! + +## TL;DR; Give me three things to remember! + +* Flink’s active memory management avoids nasty `OutOfMemoryErrors` that kill your JVMs and reduces garbage collection overhead. +* Flink features a highly efficient data de/serialization stack that facilitates operations on binary data and makes more data fit into memory. +* Flink’s DBMS-style operators operate natively on binary data yielding high performance in-memory and destage gracefully to disk if necessary. diff --git a/docs/content.tr/posts/2015-05-14-Community-update-April.md b/docs/content.tr/posts/2015-05-14-Community-update-April.md new file mode 100644 index 0000000000..9fdaf2b9c2 --- /dev/null +++ b/docs/content.tr/posts/2015-05-14-Community-update-April.md @@ -0,0 +1,45 @@ +--- +author: Kostas Tzoumas +author-twitter: kostas_tzoumas +date: "2015-05-14T10:00:00Z" +excerpt:

The monthly update from the Flink community. Including the availability + of a new preview release, lots of meetups and conference talks and a great interview + about Flink.

+title: April 2015 in the Flink community +aliases: +- /news/2015/05/14/Community-update-April.html +--- + + +April was an packed month for Apache Flink. + +### Flink runner for Google Cloud Dataflow + +A Flink runner for Google Cloud Dataflow was announced. See the blog +posts by [data Artisans](http://data-artisans.com/announcing-google-cloud-dataflow-on-flink-and-easy-flink-deployment-on-google-cloud/) and +the [Google Cloud Platform Blog](http://googlecloudplatform.blogspot.de/2015/03/announcing-Google-Cloud-Dataflow-runner-for-Apache-Flink.html). +Google Cloud Dataflow programs can be written using and open-source +SDK and run in multiple backends, either as a managed service inside +Google's infrastructure, or leveraging open source runners, +including Apache Flink. + + +## Flink 0.9.0-milestone1 release + +The highlight of April was of course the availability of [Flink 0.9-milestone1](/news/2015/04/13/release-0.9.0-milestone1.html). This was a release packed with new features, including, a Python DataSet API, the new SQL-like Table API, FlinkML, a machine learning library on Flink, Gelly, FLink's Graph API, as well as a mode to run Flink on YARN leveraging Tez. In case you missed it, check out the [release announcement blog post](/news/2015/04/13/release-0.9.0-milestone1.html) for details + +## Conferences and meetups + +April kicked off the conference season. Apache Flink was presented at ApacheCon in Texas ([slides](http://www.slideshare.net/fhueske/apache-flink)), the Hadoop Summit in Brussels featured two talks on Flink (see slides [here](http://www.slideshare.net/AljoschaKrettek/data-analysis-with-apache-flink-hadoop-summit-2015) and [here](http://www.slideshare.net/GyulaFra/flink-streaming-hadoopsummit)), as well as at the Hadoop User Groups of the Netherlands ([slides](http://www.slideshare.net/stephanewen1/apache-flink-overview-and-use-cases-at-prehadoop-summit-meetups)) and Stockholm. The brand new [Apache Flink meetup Stockholm](http://www.meetup.com/Apache-Flink-Stockholm/) was also established. + +## Google Summer of Code + +Three students will work on Flink during Google's [Summer of Code program](https://www.google-melange.com/gsoc/homepage/google/gsoc2015) on distributed pattern matching, exact and approximate statistics for data streams and windows, as well as asynchronous iterations and updates. + +## Flink on the web + +Fabian Hueske gave an [interview at InfoQ](http://www.infoq.com/news/2015/04/hueske-apache-flink?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global) on Apache Flink. + +## Upcoming events + +Stay tuned for a wealth of upcoming events! Two Flink talsk will be presented at [Berlin Buzzwords](http://berlinbuzzwords.de/15/sessions), Flink will be presented at the [Hadoop Summit in San Jose](http://2015.hadoopsummit.org/san-jose/). A [training workshop on Apache Flink](http://www.meetup.com/Apache-Flink-Meetup/events/220557545/) is being organized in Berlin. Finally, [Flink Forward](http://2015.flink-forward.org/), the first conference to bring together the whole Flink community is taking place in Berlin in October 2015. diff --git a/docs/content.tr/posts/2015-06-24-announcing-apache-flink-0.9.0-release.md b/docs/content.tr/posts/2015-06-24-announcing-apache-flink-0.9.0-release.md new file mode 100644 index 0000000000..81a5fb4ae1 --- /dev/null +++ b/docs/content.tr/posts/2015-06-24-announcing-apache-flink-0.9.0-release.md @@ -0,0 +1,190 @@ +--- +date: "2015-06-24T14:00:00Z" +title: Announcing Apache Flink 0.9.0 +aliases: +- /news/2015/06/24/announcing-apache-flink-0.9.0-release.html +--- + +The Apache Flink community is pleased to announce the availability of the 0.9.0 release. The release is the result of many months of hard work within the Flink community. It contains many new features and improvements which were previewed in the 0.9.0-milestone1 release and have been polished since then. This is the largest Flink release so far. + +[Download the release](http://flink.apache.org/downloads.html) and check out [the documentation]({{< param DocsBaseUrl >}}flink-docs-release-0.9/). Feedback through the Flink[ mailing lists](http://flink.apache.org/community.html#mailing-lists) is, as always, very welcome! + +## New Features + +### Exactly-once Fault Tolerance for streaming programs + +This release introduces a new fault tolerance mechanism for streaming dataflows. The new checkpointing algorithm takes data sources and also user-defined state into account and recovers failures such that all records are reflected exactly once in the operator states. + +The checkpointing algorithm is lightweight and driven by barriers that are periodically injected into the data streams at the sources. As such, it has an extremely low coordination overhead and is able to sustain very high throughput rates. User-defined state can be automatically backed up to configurable storage by the fault tolerance mechanism. + +Please refer to [the documentation on stateful computation]({{< param DocsBaseUrl >}}flink-docs-release-0.9/apis/streaming_guide.html#stateful-computation) for details in how to use fault tolerant data streams with Flink. + +The fault tolerance mechanism requires data sources that can replay recent parts of the stream, such as [Apache Kafka](http://kafka.apache.org). Read more [about how to use the persistent Kafka source]({{< param DocsBaseUrl >}}flink-docs-release-0.9/apis/streaming_guide.html#apache-kafka). + +### Table API + +Flink’s new Table API offers a higher-level abstraction for interacting with structured data sources. The Table API allows users to execute logical, SQL-like queries on distributed data sets while allowing them to freely mix declarative queries with regular Flink operators. Here is an example that groups and joins two tables: + +```scala +val clickCounts = clicks + .groupBy('user).select('userId, 'url.count as 'count) + +val activeUsers = users.join(clickCounts) + .where('id === 'userId && 'count > 10).select('username, 'count, ...) +``` + +Tables consist of logical attributes that can be selected by name rather than physical Java and Scala data types. This alleviates a lot of boilerplate code for common ETL tasks and raises the abstraction for Flink programs. Tables are available for both static and streaming data sources (DataSet and DataStream APIs). + +[Check out the Table guide for Java and Scala]({{< param DocsBaseUrl >}}flink-docs-release-0.9/libs/table.html). + +### Gelly Graph Processing API + +Gelly is a Java Graph API for Flink. It contains a set of utilities for graph analysis, support for iterative graph processing and a library of graph algorithms. Gelly exposes a Graph data structure that wraps DataSets for vertices and edges, as well as methods for creating graphs from DataSets, graph transformations and utilities (e.g., in- and out- degrees of vertices), neighborhood aggregations, iterative vertex-centric graph processing, as well as a library of common graph algorithms, including PageRank, SSSP, label propagation, and community detection. + +Gelly internally builds on top of Flink’s[ delta iterations]({{< param DocsBaseUrl >}}flink-docs-release-0.9/apis/iterations.html). Iterative graph algorithms are executed leveraging mutable state, achieving similar performance with specialized graph processing systems. + +Gelly will eventually subsume Spargel, Flink’s Pregel-like API. + +Note: The Gelly library is still in beta status and subject to improvements and heavy performance tuning. + +[Check out the Gelly guide]({{< param DocsBaseUrl >}}flink-docs-release-0.9/libs/gelly_guide.html). + +### Flink Machine Learning Library + +This release includes the first version of Flink’s Machine Learning library. The library’s pipeline approach, which has been strongly inspired by scikit-learn’s abstraction of transformers and predictors, makes it easy to quickly set up a data processing pipeline and to get your job done. + +Flink distinguishes between transformers and predictors. Transformers are components which transform your input data into a new format allowing you to extract features, cleanse your data or to sample from it. Predictors on the other hand constitute the components which take your input data and train a model on it. The model you obtain from the learner can then be evaluated and used to make predictions on unseen data. + +Currently, the machine learning library contains transformers and predictors to do multiple tasks. The library supports multiple linear regression using stochastic gradient descent to scale to large data sizes. Furthermore, it includes an alternating least squares (ALS) implementation to factorizes large matrices. The matrix factorization can be used to do collaborative filtering. An implementation of the communication efficient distributed dual coordinate ascent (CoCoA) algorithm is the latest addition to the library. The CoCoA algorithm can be used to train distributed soft-margin SVMs. + +Note: The ML library is still in beta status and subject to improvements and heavy performance tuning. + +[Check out FlinkML]({{< param DocsBaseUrl >}}flink-docs-release-0.9/libs/ml/) + +### Flink on YARN leveraging Apache Tez + +We are introducing a new execution mode for Flink to be able to run restricted Flink programs on top of[ Apache Tez](http://tez.apache.org). This mode retains Flink’s APIs, optimizer, as well as Flink’s runtime operators, but instead of wrapping those in Flink tasks that are executed by Flink TaskManagers, it wraps them in Tez runtime tasks and builds a Tez DAG that represents the program. + +By using Flink on Tez, users have an additional choice for an execution platform for Flink programs. While Flink’s distributed runtime favors low latency, streaming shuffles, and iterative algorithms, Tez focuses on scalability and elastic resource usage in shared YARN clusters. + +[Get started with Flink on Tez]({{< param DocsBaseUrl >}}flink-docs-release-0.9/setup/flink_on_tez.html). + +### Reworked Distributed Runtime on Akka + +Flink’s RPC system has been replaced by the widely adopted[ Akka](http://akka.io) framework. Akka’s concurrency model offers the right abstraction to develop a fast as well as robust distributed system. By using Akka’s own failure detection mechanism the stability of Flink’s runtime is significantly improved, because the system can now react in proper form to node outages. Furthermore, Akka improves Flink’s scalability by introducing asynchronous messages to the system. These asynchronous messages allow Flink to be run on many more nodes than before. + +### Improved YARN support + +Flink’s YARN client contains several improvements, such as a detached mode for starting a YARN session in the background, the ability to submit a single Flink job to a YARN cluster without starting a session, including a "fire and forget" mode. Flink is now also able to reallocate failed YARN containers to maintain the size of the requested cluster. This feature allows to implement fault-tolerant setups on top of YARN. There is also an internal Java API to deploy and control a running YARN cluster. This is being used by system integrators to easily control Flink on YARN within their Hadoop 2 cluster. + +[See the YARN docs]({{< param DocsBaseUrl >}}flink-docs-release-0.9/setup/yarn_setup.html). + +### Static Code Analysis for the Flink Optimizer: Opening the UDF blackboxes + +This release introduces a first version of a static code analyzer that pre-interprets functions written by the user to get information about the function’s internal dataflow. The code analyzer can provide useful information about [forwarded fields]({{< param DocsBaseUrl >}}flink-docs-release-0.9/apis/programming_guide.html#semantic-annotations) to Flink's optimizer and thus speedup job executions. It also informs if the code contains obvious mistakes. For stability reasons, the code analyzer is initially disabled by default. It can be activated through + +ExecutionEnvironment.getExecutionConfig().setCodeAnalysisMode(...) + +either as an assistant that gives hints during the implementation or by directly applying the optimizations that have been found. + +## More Improvements and Fixes + +* [FLINK-1605](https://issues.apache.org/jira/browse/FLINK-1605): Flink is not exposing its Guava and ASM dependencies to Maven projects depending on Flink. We use the maven-shade-plugin to relocate these dependencies into our own namespace. This allows users to use any Guava or ASM version. + +* [FLINK-1417](https://issues.apache.org/jira/browse/FLINK-1605): Automatic recognition and registration of Java Types at Kryo and the internal serializers: Flink has its own type handling and serialization framework falling back to Kryo for types that it cannot handle. To get the best performance Flink is automatically registering all types a user is using in their program with Kryo.Flink also registers serializers for Protocol Buffers, Thrift, Avro and YodaTime automatically. Users can also manually register serializers to Kryo (https://issues.apache.org/jira/browse/FLINK-1399) + +* [FLINK-1296](https://issues.apache.org/jira/browse/FLINK-1296): Add support for sorting very large records + +* [FLINK-1679](https://issues.apache.org/jira/browse/FLINK-1679): "degreeOfParallelism" methods renamed to “parallelism” + +* [FLINK-1501](https://issues.apache.org/jira/browse/FLINK-1501): Add metrics library for monitoring TaskManagers + +* [FLINK-1760](https://issues.apache.org/jira/browse/FLINK-1760): Add support for building Flink with Scala 2.11 + +* [FLINK-1648](https://issues.apache.org/jira/browse/FLINK-1648): Add a mode where the system automatically sets the parallelism to the available task slots + +* [FLINK-1622](https://issues.apache.org/jira/browse/FLINK-1622): Add groupCombine operator + +* [FLINK-1589](https://issues.apache.org/jira/browse/FLINK-1589): Add option to pass Configuration to LocalExecutor + +* [FLINK-1504](https://issues.apache.org/jira/browse/FLINK-1504): Add support for accessing secured HDFS clusters in standalone mode + +* [FLINK-1478](https://issues.apache.org/jira/browse/FLINK-1478): Add strictly local input split assignment + +* [FLINK-1512](https://issues.apache.org/jira/browse/FLINK-1512): Add CsvReader for reading into POJOs. + +* [FLINK-1461](https://issues.apache.org/jira/browse/FLINK-1461): Add sortPartition operator + +* [FLINK-1450](https://issues.apache.org/jira/browse/FLINK-1450): Add Fold operator to the Streaming api + +* [FLINK-1389](https://issues.apache.org/jira/browse/FLINK-1389): Allow setting custom file extensions for files created by the FileOutputFormat + +* [FLINK-1236](https://issues.apache.org/jira/browse/FLINK-1236): Add support for localization of Hadoop Input Splits + +* [FLINK-1179](https://issues.apache.org/jira/browse/FLINK-1179): Add button to JobManager web interface to request stack trace of a TaskManager + +* [FLINK-1105](https://issues.apache.org/jira/browse/FLINK-1105): Add support for locally sorted output + +* [FLINK-1688](https://issues.apache.org/jira/browse/FLINK-1688): Add socket sink + +* [FLINK-1436](https://issues.apache.org/jira/browse/FLINK-1436): Improve usability of command line interface + +* [FLINK-2174](https://issues.apache.org/jira/browse/FLINK-2174): Allow comments in 'slaves' file + +* [FLINK-1698](https://issues.apache.org/jira/browse/FLINK-1698): Add polynomial base feature mapper to ML library + +* [FLINK-1697](https://issues.apache.org/jira/browse/FLINK-1697): Add alternating least squares algorithm for matrix factorization to ML library + +* [FLINK-1792](https://issues.apache.org/jira/browse/FLINK-1792): FLINK-456 Improve TM Monitoring: CPU utilization, hide graphs by default and show summary only + +* [FLINK-1672](https://issues.apache.org/jira/browse/FLINK-1672): Refactor task registration/unregistration + +* [FLINK-2001](https://issues.apache.org/jira/browse/FLINK-2001): DistanceMetric cannot be serialized + +* [FLINK-1676](https://issues.apache.org/jira/browse/FLINK-1676): enableForceKryo() is not working as expected + +* [FLINK-1959](https://issues.apache.org/jira/browse/FLINK-1959): Accumulators BROKEN after Partitioning + +* [FLINK-1696](https://issues.apache.org/jira/browse/FLINK-1696): Add multiple linear regression to ML library + +* [FLINK-1820](https://issues.apache.org/jira/browse/FLINK-1820): Bug in DoubleParser and FloatParser - empty String is not casted to 0 + +* [FLINK-1985](https://issues.apache.org/jira/browse/FLINK-1985): Streaming does not correctly forward ExecutionConfig to runtime + +* [FLINK-1828](https://issues.apache.org/jira/browse/FLINK-1828): Impossible to output data to an HBase table + +* [FLINK-1952](https://issues.apache.org/jira/browse/FLINK-1952): Cannot run ConnectedComponents example: Could not allocate a slot on instance + +* [FLINK-1848](https://issues.apache.org/jira/browse/FLINK-1848): Paths containing a Windows drive letter cannot be used in FileOutputFormats + +* [FLINK-1954](https://issues.apache.org/jira/browse/FLINK-1954): Task Failures and Error Handling + +* [FLINK-2004](https://issues.apache.org/jira/browse/FLINK-2004): Memory leak in presence of failed checkpoints in KafkaSource + +* [FLINK-2132](https://issues.apache.org/jira/browse/FLINK-2132): Java version parsing is not working for OpenJDK + +* [FLINK-2098](https://issues.apache.org/jira/browse/FLINK-2098): Checkpoint barrier initiation at source is not aligned with snapshotting + +* [FLINK-2069](https://issues.apache.org/jira/browse/FLINK-2069): writeAsCSV function in DataStream Scala API creates no file + +* [FLINK-2092](https://issues.apache.org/jira/browse/FLINK-2092): Document (new) behavior of print() and execute() + +* [FLINK-2177](https://issues.apache.org/jira/browse/FLINK-2177): NullPointer in task resource release + +* [FLINK-2054](https://issues.apache.org/jira/browse/FLINK-2054): StreamOperator rework removed copy calls when passing output to a chained operator + +* [FLINK-2196](https://issues.apache.org/jira/browse/FLINK-2196): Missplaced Class in flink-java SortPartitionOperator + +* [FLINK-2191](https://issues.apache.org/jira/browse/FLINK-2191): Inconsistent use of Closure Cleaner in Streaming API + +* [FLINK-2206](https://issues.apache.org/jira/browse/FLINK-2206): JobManager webinterface shows 5 finished jobs at most + +* [FLINK-2188](https://issues.apache.org/jira/browse/FLINK-2188): Reading from big HBase Tables + +* [FLINK-1781](https://issues.apache.org/jira/browse/FLINK-1781): Quickstarts broken due to Scala Version Variables + +## Notice + +The 0.9 series of Flink is the last version to support Java 6. If you are still using Java 6, please consider upgrading to Java 8 (Java 7 ended its free support in April 2015). + +Flink will require at least Java 7 in major releases after 0.9.0. diff --git a/docs/content.tr/posts/2015-08-24-introducing-flink-gelly.md b/docs/content.tr/posts/2015-08-24-introducing-flink-gelly.md new file mode 100644 index 0000000000..87cfe37088 --- /dev/null +++ b/docs/content.tr/posts/2015-08-24-introducing-flink-gelly.md @@ -0,0 +1,454 @@ +--- +date: "2015-08-24T00:00:00Z" +title: 'Introducing Gelly: Graph Processing with Apache Flink' +aliases: +- /news/2015/08/24/introducing-flink-gelly.html +--- + +This blog post introduces **Gelly**, Apache Flink's *graph-processing API and library*. Flink's native support +for iterations makes it a suitable platform for large-scale graph analytics. +By leveraging delta iterations, Gelly is able to map various graph processing models such as +vertex-centric or gather-sum-apply to Flink dataflows. + +Gelly allows Flink users to perform end-to-end data analysis in a single system. +Gelly can be seamlessly used with Flink's DataSet API, +which means that pre-processing, graph creation, analysis, and post-processing can be done +in the same application. At the end of this post, we will go through a step-by-step example +in order to demonstrate that loading, transformation, filtering, graph creation, and analysis +can be performed in a single Flink program. + +**Overview** + +1. [What is Gelly?](#what-is-gelly) +2. [Graph Representation and Creation](#graph-representation-and-creation) +3. [Transformations and Utilities](#transformations-and-utilities) +4. [Iterative Graph Processing](#iterative-graph-processing) +5. [Library of Graph Algorithms](#library-of-graph-algorithms) +6. [Use-Case: Music Profiles](#use-case-music-profiles) +7. [Ongoing and Future Work](#ongoing-and-future-work) + + + +## What is Gelly? + +Gelly is a Graph API for Flink. It is currently supported in both Java and Scala. +The Scala methods are implemented as wrappers on top of the basic Java operations. +The API contains a set of utility functions for graph analysis, supports iterative graph +processing and introduces a library of graph algorithms. + +
+ +
+ +[Back to top](#top) + +## Graph Representation and Creation + +In Gelly, a graph is represented by a DataSet of vertices and a DataSet of edges. +A vertex is defined by its unique ID and a value, whereas an edge is defined by its source ID, +target ID, and value. A vertex or edge for which a value is not specified will simply have the +value type set to `NullValue`. + +A graph can be created from: + +1. **DataSet of edges** and an optional **DataSet of vertices** using `Graph.fromDataSet()` +2. **DataSet of Tuple3** and an optional **DataSet of Tuple2** using `Graph.fromTupleDataSet()` +3. **Collection of edges** and an optional **Collection of vertices** using `Graph.fromCollection()` + +In all three cases, if the vertices are not provided, +Gelly will automatically produce the vertex IDs from the edge source and target IDs. + +[Back to top](#top) + +## Transformations and Utilities + +These are methods of the Graph class and include common graph metrics, transformations +and mutations as well as neighborhood aggregations. + +#### Common Graph Metrics +These methods can be used to retrieve several graph metrics and properties, such as the number +of vertices, edges and the node degrees. + +#### Transformations +The transformation methods enable several Graph operations, using high-level functions similar to +the ones provided by the batch processing API. These transformations can be applied one after the +other, yielding a new Graph after each step, in a fashion similar to operators on DataSets: + +```java +inputGraph.getUndirected().mapEdges(new CustomEdgeMapper()); +``` + +Transformations can be applied on: + +1. **Vertices**: `mapVertices`, `joinWithVertices`, `filterOnVertices`, `addVertex`, ... +2. **Edges**: `mapEdges`, `filterOnEdges`, `removeEdge`, ... +3. **Triplets** (source vertex, target vertex, edge): `getTriplets` + +#### Neighborhood Aggregations + +Neighborhood methods allow vertices to perform an aggregation on their first-hop neighborhood. +This provides a vertex-centric view, where each vertex can access its neighboring edges and neighbor values. + +`reduceOnEdges()` provides access to the neighboring edges of a vertex, +i.e. the edge value and the vertex ID of the edge endpoint. In order to also access the +neighboring vertices’ values, one should call the `reduceOnNeighbors()` function. +The scope of the neighborhood is defined by the EdgeDirection parameter, which can be IN, OUT or ALL, +to gather in-coming, out-going or all edges (neighbors) of a vertex. + +The two neighborhood +functions mentioned above can only be used when the aggregation function is associative and commutative. +In case the function does not comply with these restrictions or if it is desirable to return zero, +one or more values per vertex, the more general `groupReduceOnEdges()` and +`groupReduceOnNeighbors()` functions must be called. + +Consider the following graph, for instance: + +
+ +
+ +Assume you would want to compute the sum of the values of all incoming neighbors for each vertex. +We will call the `reduceOnNeighbors()` aggregation method since the sum is an associative and commutative operation and the neighbors’ values are needed: + +```java +graph.reduceOnNeighbors(new SumValues(), EdgeDirection.IN); +``` + +The vertex with id 1 is the only node that has no incoming edges. The result is therefore: + +
+ +
+ +[Back to top](#top) + +## Iterative Graph Processing + +During the past few years, many different programming models for distributed graph processing +have been introduced: [vertex-centric](http://delivery.acm.org/10.1145/2490000/2484843/a22-salihoglu.pdf?ip=141.23.53.206&id=2484843&acc=ACTIVE%20SERVICE&key=2BA2C432AB83DA15.0F42380CB8DD3307.4D4702B0C3E38B35.4D4702B0C3E38B35&CFID=706313474&CFTOKEN=60107876&__acm__=1440408958_b131e035942130653e5782409b5c0cde), +[partition-centric](http://researcher.ibm.com/researcher/files/us-ytian/giraph++.pdf), [gather-apply-scatter](http://www.eecs.harvard.edu/cs261/notes/gonzalez-2012.htm), +[edge-centric](http://infoscience.epfl.ch/record/188535/files/paper.pdf), [neighborhood-centric](http://www.vldb.org/pvldb/vol7/p1673-quamar.pdf). +Each one of these models targets a specific class of graph applications and each corresponding +system implementation optimizes the runtime respectively. In Gelly, we would like to exploit the +flexible dataflow model and the efficient iterations of Flink, to support multiple distributed +graph processing models on top of the same system. + +Currently, Gelly has methods for writing vertex-centric programs and provides support for programs +implemented using the gather-sum(accumulate)-apply model. We are also considering to offer support +for the partition-centric computation model, using Fink’s `mapPartition()` operator. +This model exposes the partition structure to the user and allows local graph structure exploitation +inside a partition to avoid unnecessary communication. + +#### Vertex-centric + +Gelly wraps Flink’s [Spargel APi]({{< param DocsBaseUrl >}}flink-docs-release-0.8/spargel_guide.html) to +support the vertex-centric, Pregel-like programming model. Gelly’s `runVertexCentricIteration` method accepts two user-defined functions: + +1. **MessagingFunction:** defines what messages a vertex sends out for the next superstep. +2. **VertexUpdateFunction:*** defines how a vertex will update its value based on the received messages. + +The method will execute the vertex-centric iteration on the input Graph and return a new Graph, with updated vertex values. + +Gelly’s vertex-centric programming model exploits Flink’s efficient delta iteration operators. +Many iterative graph algorithms expose non-uniform behavior, where some vertices converge to +their final value faster than others. In such cases, the number of vertices that need to be +recomputed during an iteration decreases as the algorithm moves towards convergence. + +For example, consider a Single Source Shortest Paths problem on the following graph, where S +is the source node, i is the iteration counter and the edge values represent distances between nodes: + +
+ +
+ +In each iteration, a vertex receives distances from its neighbors and adopts the minimum of +these distances and its current distance as the new value. Then, it propagates its new value +to its neighbors. If a vertex does not change value during an iteration, there is no need for +it to propagate its old distance to its neighbors; as they have already taken it into account. + +Flink’s `IterateDelta` operator permits exploitation of this property as well as the +execution of computations solely on the active parts of the graph. The operator receives two inputs: + +1. the **Solution Set**, which represents the current state of the input and +2. the **Workset**, which determines which parts of the graph will be recomputed in the next iteration. + +In the SSSP example above, the Workset contains the vertices which update their distances. +The user-defined iterative function is applied on these inputs to produce state updates. +These updates are efficiently applied on the state, which is kept in memory. + +
+ +
+ +Internally, a vertex-centric iteration is a Flink delta iteration, where the initial Solution Set +is the vertex set of the input graph and the Workset is created by selecting the active vertices, +i.e. the ones that updated their value in the previous iteration. The messaging and vertex-update +functions are user-defined functions wrapped inside coGroup operators. In each superstep, +the active vertices (Workset) are coGrouped with the edges to generate the neighborhoods for +each vertex. The messaging function is then applied on each neighborhood. Next, the result of the +messaging function is coGrouped with the current vertex values (Solution Set) and the user-defined +vertex-update function is applied on the result. The output of this coGroup operator is finally +used to update the Solution Set and create the Workset input for the next iteration. + +
+ +
+ +#### Gather-Sum-Apply + +Gelly supports a variation of the popular Gather-Sum-Apply-Scatter computation model, +introduced by PowerGraph. In GSA, a vertex pulls information from its neighbors as opposed to the +vertex-centric approach where the updates are pushed from the incoming neighbors. +The `runGatherSumApplyIteration()` accepts three user-defined functions: + +1. **GatherFunction:** gathers neighboring partial values along in-edges. +2. **SumFunction:** accumulates/reduces the values into a single one. +3. **ApplyFunction:** uses the result computed in the sum phase to update the current vertex’s value. + +Similarly to vertex-centric, GSA leverages Flink’s delta iteration operators as, in many cases, +vertex values do not need to be recomputed during an iteration. + +Let us reconsider the Single Source Shortest Paths algorithm. In each iteration, a vertex: + +1. **Gather** retrieves distances from its neighbors summed up with the corresponding edge values; +2. **Sum** compares the newly obtained distances in order to extract the minimum; +3. **Apply** and finally adopts the minimum distance computed in the sum step, +provided that it is lower than its current value. If a vertex’s value does not change during +an iteration, it no longer propagates its distance. + +Internally, a Gather-Sum-Apply Iteration is a Flink delta iteration where the initial solution +set is the vertex input set and the workset is created by selecting the active vertices. + +The three functions: gather, sum and apply are user-defined functions wrapped in map, reduce +and join operators respectively. In each superstep, the active vertices are joined with the +edges in order to create neighborhoods for each vertex. The gather function is then applied on +the neighborhood values via a map function. Afterwards, the result is grouped by the vertex ID +and reduced using the sum function. Finally, the outcome of the sum phase is joined with the +current vertex values (solution set), the values are updated, thus creating a new workset that +serves as input for the next iteration. + +
+ +
+ +[Back to top](#top) + +## Library of Graph Algorithms + +We are building a library of graph algorithms in Gelly, to easily analyze large-scale graphs. +These algorithms extend the `GraphAlgorithm` interface and can be simply executed on +the input graph by calling a `run()` method. + +We currently have implementations of the following algorithms: + +1. PageRank +2. Single-Source-Shortest-Paths +3. Label Propagation +4. Community Detection (based on [this paper](http://arxiv.org/pdf/0808.2633.pdf)) +5. Connected Components +6. GSA Connected Components +7. GSA PageRank +8. GSA Single-Source-Shortest-Paths + +Gelly also offers implementations of common graph algorithms through [examples](https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example). +Among them, one can find graph weighting schemes, like Jaccard Similarity and Euclidean Distance Weighting, +as well as computation of common graph metrics. + +[Back to top](#top) + +## Use-Case: Music Profiles + +In the following section, we go through a use-case scenario that combines the Flink DataSet API +with Gelly in order to process users’ music preferences to suggest additions to their playlist. + +First, we read a user’s music profile which is in the form of user-id, song-id and the number of +plays that each song has. We then filter out the list of songs the users do not wish to see in their +playlist. Then we compute the top songs per user (i.e. the songs a user listened to the most). +Finally, as a separate use-case on the same data set, we create a user-user similarity graph based +on the common songs and use this resulting graph to detect communities by calling Gelly’s Label Propagation +library method. + +For running the example implementation, please use the 0.10-SNAPSHOT version of Flink as a +dependency. The full example code base can be found [here](https://github.com/apache/flink/blob/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/MusicProfiles.java). The public data set used for testing +can be found [here](http://labrosa.ee.columbia.edu/millionsong/tasteprofile). This data set contains **48,373,586** real user-id, song-id and +play-count triplets. + +**Note:** The code snippets in this post try to reduce verbosity by skipping type parameters of generic functions. Please have a look at [the full example](https://github.com/apache/flink/blob/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/MusicProfiles.java) for the correct and complete code. + +#### Filtering out Bad Records + +After reading the `(user-id, song-id, play-count)` triplets from a CSV file and after parsing a +text file in order to retrieve the list of songs that a user would not want to include in a +playlist, we use a coGroup function to filter out the mismatches. + +```java +// read the user-song-play triplets. +DataSet> triplets = + getUserSongTripletsData(env); + +// read the mismatches dataset and extract the songIDs +DataSet> validTriplets = triplets + .coGroup(mismatches).where(1).equalTo(0) + .with(new CoGroupFunction() { + void coGroup(Iterable triplets, Iterable invalidSongs, Collector out) { + if (!invalidSongs.iterator().hasNext()) { + for (Tuple3 triplet : triplets) { // valid triplet + out.collect(triplet); + } + } + } + } +``` + +The coGroup simply takes the triplets whose song-id (second field) matches the song-id from the +mismatches list (first field) and if the iterator was empty for a certain triplet, meaning that +there were no mismatches found, the triplet associated with that song is collected. + +#### Compute the Top Songs per User + +As a next step, we would like to see which songs a user played more often. To this end, we +build a user-song weighted, bipartite graph in which edge source vertices are users, edge target +vertices are songs and where the weight represents the number of times the user listened to that +certain song. + +
+ +
+ +```java +// create a user -> song weighted bipartite graph where the edge weights +// correspond to play counts +Graph userSongGraph = Graph.fromTupleDataSet(validTriplets, env); +``` + +Consult the [Gelly guide]({{< param DocsBaseUrl >}}flink-docs-master/dev/libs/gelly/) for guidelines +on how to create a graph from a given DataSet of edges or from a collection. + +To retrieve the top songs per user, we call the groupReduceOnEdges function as it perform an +aggregation over the first hop neighborhood taking just the edges into consideration. We will +basically iterate through the edge value and collect the target (song) of the maximum weight edge. + +```java +//get the top track (most listened to) for each user +DataSet usersWithTopTrack = userSongGraph + .groupReduceOnEdges(new GetTopSongPerUser(), EdgeDirection.OUT); + +class GetTopSongPerUser implements EdgesFunctionWithVertexValue { + void iterateEdges(Vertex vertex, Iterable edges) { + int maxPlaycount = 0; + String topSong = ""; + + for (Edge edge : edges) { + if (edge.getValue() > maxPlaycount) { + maxPlaycount = edge.getValue(); + topSong = edge.getTarget(); + } + } + return new Tuple2(vertex.getId(), topSong); + } +} +``` + +#### Creating a User-User Similarity Graph + +Clustering users based on common interests, in this case, common top songs, could prove to be +very useful for advertisements or for recommending new musical compilations. In a user-user graph, +two users who listen to the same song will simply be linked together through an edge as depicted +in the figure below. + +
+ +
+ +To form the user-user graph in Flink, we will simply take the edges from the user-song graph +(left-hand side of the image), group them by song-id, and then add all the users (source vertex ids) +to an ArrayList. + +We then match users who listened to the same song two by two, creating a new edge to mark their +common interest (right-hand side of the image). + +Afterwards, we perform a `distinct()` operation to avoid creation of duplicate data. +Considering that we now have the DataSet of edges which present interest, creating a graph is as +straightforward as a call to the `Graph.fromDataSet()` method. + +```java +// create a user-user similarity graph: +// two users that listen to the same song are connected +DataSet similarUsers = userSongGraph.getEdges() + // filter out user-song edges that are below the playcount threshold + .filter(new FilterFunction>() { + public boolean filter(Edge edge) { + return (edge.getValue() > playcountThreshold); + } + }) + .groupBy(1) + .reduceGroup(new GroupReduceFunction() { + void reduce(Iterable edges, Collector out) { + List users = new ArrayList(); + for (Edge edge : edges) { + users.add(edge.getSource()); + for (int i = 0; i < users.size() - 1; i++) { + for (int j = i+1; j < users.size() - 1; j++) { + out.collect(new Edge(users.get(i), users.get(j))); + } + } + } + } + }) + .distinct(); + +Graph similarUsersGraph = Graph.fromDataSet(similarUsers).getUndirected(); +``` + +After having created a user-user graph, it would make sense to detect the various communities +formed. To do so, we first initialize each vertex with a numeric label using the +`joinWithVertices()` function that takes a data set of Tuple2 as a parameter and joins +the id of a vertex with the first element of the tuple, afterwards applying a map function. +Finally, we call the `run()` method with the LabelPropagation library method passed +as a parameter. In the end, the vertices will be updated to contain the most frequent label +among their neighbors. + +```java +// detect user communities using label propagation +// initialize each vertex with a unique numeric label +DataSet> idsWithInitialLabels = DataSetUtils + .zipWithUniqueId(similarUsersGraph.getVertexIds()) + .map(new MapFunction, Tuple2>() { + @Override + public Tuple2 map(Tuple2 tuple2) throws Exception { + return new Tuple2(tuple2.f1, tuple2.f0); + } + }); + +// update the vertex values and run the label propagation algorithm +DataSet verticesWithCommunity = similarUsersGraph + .joinWithVertices(idsWithlLabels, new MapFunction() { + public Long map(Tuple2 idWithLabel) { + return idWithLabel.f1; + } + }) + .run(new LabelPropagation(numIterations)) + .getVertices(); +``` + +[Back to top](#top) + +## Ongoing and Future Work + +Currently, Gelly matches the basic functionalities provided by most state-of-the-art graph +processing systems. Our vision is to turn Gelly into more than “yet another library for running +PageRank-like algorithms” by supporting generic iterations, implementing graph partitioning, +providing bipartite graph support and by offering numerous other features. + +We are also enriching Flink Gelly with a set of operators suitable for highly skewed graphs +as well as a Graph API built on Flink Streaming. + +In the near future, we would like to see how Gelly can be integrated with graph visualization +tools, graph database systems and sampling techniques. + +Curious? Read more about our plans for Gelly in the [roadmap](https://cwiki.apache.org/confluence/display/FLINK/Flink+Gelly). + +[Back to top](#top) + +## Links +[Gelly Documentation]({{< param DocsBaseUrl >}}flink-docs-master/dev/libs/gelly/) \ No newline at end of file diff --git a/docs/content.tr/posts/2015-09-01-release-0.9.1.md b/docs/content.tr/posts/2015-09-01-release-0.9.1.md new file mode 100644 index 0000000000..beaa6e5eb5 --- /dev/null +++ b/docs/content.tr/posts/2015-09-01-release-0.9.1.md @@ -0,0 +1,58 @@ +--- +date: "2015-09-01T08:00:00Z" +title: Apache Flink 0.9.1 available +aliases: +- /news/2015/09/01/release-0.9.1.html +--- + +The Flink community is happy to announce that Flink 0.9.1 is now available. + +0.9.1 is a maintenance release, which includes a lot of minor fixes across +several parts of the system. We suggest all users of Flink to work with this +latest stable version. + +[Download the release](/downloads.html) and [check out the +documentation]({{ site.docs-stable }}). Feedback through the Flink mailing lists +is, as always, very welcome! + +The following [issues were fixed](https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.9.1) +for this release: + +- [FLINK-1916](https://issues.apache.org/jira/browse/FLINK-1916) EOFException when running delta-iteration job +- [FLINK-2089](https://issues.apache.org/jira/browse/FLINK-2089) "Buffer recycled" IllegalStateException during cancelling +- [FLINK-2189](https://issues.apache.org/jira/browse/FLINK-2189) NullPointerException in MutableHashTable +- [FLINK-2205](https://issues.apache.org/jira/browse/FLINK-2205) Confusing entries in JM Webfrontend Job Configuration section +- [FLINK-2229](https://issues.apache.org/jira/browse/FLINK-2229) Data sets involving non-primitive arrays cannot be unioned +- [FLINK-2238](https://issues.apache.org/jira/browse/FLINK-2238) Scala ExecutionEnvironment.fromCollection does not work with Sets +- [FLINK-2248](https://issues.apache.org/jira/browse/FLINK-2248) Allow disabling of sdtout logging output +- [FLINK-2257](https://issues.apache.org/jira/browse/FLINK-2257) Open and close of RichWindowFunctions is not called +- [FLINK-2262](https://issues.apache.org/jira/browse/FLINK-2262) ParameterTool API misnamed function +- [FLINK-2280](https://issues.apache.org/jira/browse/FLINK-2280) GenericTypeComparator.compare() does not respect ascending flag +- [FLINK-2285](https://issues.apache.org/jira/browse/FLINK-2285) Active policy emits elements of the last window twice +- [FLINK-2286](https://issues.apache.org/jira/browse/FLINK-2286) Window ParallelMerge sometimes swallows elements of the last window +- [FLINK-2293](https://issues.apache.org/jira/browse/FLINK-2293) Division by Zero Exception +- [FLINK-2298](https://issues.apache.org/jira/browse/FLINK-2298) Allow setting custom YARN application names through the CLI +- [FLINK-2347](https://issues.apache.org/jira/browse/FLINK-2347) Rendering problem with Documentation website +- [FLINK-2353](https://issues.apache.org/jira/browse/FLINK-2353) Hadoop mapred IOFormat wrappers do not respect JobConfigurable interface +- [FLINK-2356](https://issues.apache.org/jira/browse/FLINK-2356) Resource leak in checkpoint coordinator +- [FLINK-2361](https://issues.apache.org/jira/browse/FLINK-2361) CompactingHashTable loses entries +- [FLINK-2362](https://issues.apache.org/jira/browse/FLINK-2362) distinct is missing in DataSet API documentation +- [FLINK-2381](https://issues.apache.org/jira/browse/FLINK-2381) Possible class not found Exception on failed partition producer +- [FLINK-2384](https://issues.apache.org/jira/browse/FLINK-2384) Deadlock during partition spilling +- [FLINK-2386](https://issues.apache.org/jira/browse/FLINK-2386) Implement Kafka connector using the new Kafka Consumer API +- [FLINK-2394](https://issues.apache.org/jira/browse/FLINK-2394) HadoopOutFormat OutputCommitter is default to FileOutputCommiter +- [FLINK-2412](https://issues.apache.org/jira/browse/FLINK-2412) Race leading to IndexOutOfBoundsException when querying for buffer while releasing SpillablePartition +- [FLINK-2422](https://issues.apache.org/jira/browse/FLINK-2422) Web client is showing a blank page if "Meta refresh" is disabled in browser +- [FLINK-2424](https://issues.apache.org/jira/browse/FLINK-2424) InstantiationUtil.serializeObject(Object) does not close output stream +- [FLINK-2437](https://issues.apache.org/jira/browse/FLINK-2437) TypeExtractor.analyzePojo has some problems around the default constructor detection +- [FLINK-2442](https://issues.apache.org/jira/browse/FLINK-2442) PojoType fields not supported by field position keys +- [FLINK-2447](https://issues.apache.org/jira/browse/FLINK-2447) TypeExtractor returns wrong type info when a Tuple has two fields of the same POJO type +- [FLINK-2450](https://issues.apache.org/jira/browse/FLINK-2450) IndexOutOfBoundsException in KryoSerializer +- [FLINK-2460](https://issues.apache.org/jira/browse/FLINK-2460) ReduceOnNeighborsWithExceptionITCase failure +- [FLINK-2527](https://issues.apache.org/jira/browse/FLINK-2527) If a VertexUpdateFunction calls setNewVertexValue more than once, the MessagingFunction will only see the first value set +- [FLINK-2540](https://issues.apache.org/jira/browse/FLINK-2540) LocalBufferPool.requestBuffer gets into infinite loop +- [FLINK-2542](https://issues.apache.org/jira/browse/FLINK-2542) It should be documented that it is required from a join key to override hashCode(), when it is not a POJO +- [FLINK-2555](https://issues.apache.org/jira/browse/FLINK-2555) Hadoop Input/Output Formats are unable to access secured HDFS clusters +- [FLINK-2560](https://issues.apache.org/jira/browse/FLINK-2560) Flink-Avro Plugin cannot be handled by Eclipse +- [FLINK-2572](https://issues.apache.org/jira/browse/FLINK-2572) Resolve base path of symlinked executable +- [FLINK-2584](https://issues.apache.org/jira/browse/FLINK-2584) ASM dependency is not shaded away diff --git a/docs/content.tr/posts/2015-09-03-flink-forward.md b/docs/content.tr/posts/2015-09-03-flink-forward.md new file mode 100644 index 0000000000..e4107cdc7b --- /dev/null +++ b/docs/content.tr/posts/2015-09-03-flink-forward.md @@ -0,0 +1,44 @@ +--- +date: "2015-09-03T08:00:00Z" +title: Announcing Flink Forward 2015 +aliases: +- /news/2015/09/03/flink-forward.html +--- + +[Flink Forward 2015](http://2015.flink-forward.org/) is the first +conference with Flink at its center that aims to bring together the +Apache Flink community in a single place. The organizers are starting +this conference in October 12 and 13 from Berlin, the place where +Apache Flink started. + +
+ +
+ +The [conference program](http://2015.flink-forward.org/?post_type=day) has +been announced by the organizers and a program committee consisting of +Flink PMC members. The agenda contains talks from industry and +academia as well as a dedicated session on hands-on Flink training. + +Some highlights of the talks include + +- A keynote by [William + Vambenepe](http://2015.flink-forward.org/?speaker=william-vambenepe), + lead of the product management team responsible for Big Data + services on Google Cloud Platform (BigQuery, Dataflow, etc...) on + data streaming, Google Cloud Dataflow, and Apache Flink. + +- Talks by several practitioners on how they are putting Flink to work + in their projects, including ResearchGate, Bouygues Telecom, + Amadeus, Telefonica, Capital One, Ericsson, and Otto Group. + +- Talks on how open source projects, including Apache Mahout, Apache + SAMOA (incubating), Apache Zeppelin (incubating), Apache BigTop, and + Apache Storm integrate with Apache Flink. + +- Talks by Flink committers on several aspects of the system, such as + fault tolerance, the internal runtime architecture, and others. + +Check out the [schedule](http://2015.flink-forward.org/?post_type=day) and +register for the conference. + diff --git a/docs/content.tr/posts/2015-09-16-off-heap-memory.md b/docs/content.tr/posts/2015-09-16-off-heap-memory.md new file mode 100644 index 0000000000..7964d72945 --- /dev/null +++ b/docs/content.tr/posts/2015-09-16-off-heap-memory.md @@ -0,0 +1,875 @@ +--- +author: Stephan Ewen +author-twitter: stephanewen +date: "2015-09-16T08:00:00Z" +excerpt: |- +

Running data-intensive code in the JVM and making it well-behaved is tricky. Systems that put billions of data objects naively onto the JVM heap face unpredictable OutOfMemoryErrors and Garbage Collection stalls. Of course, you still want to to keep your data in memory as much as possible, for speed and responsiveness of the processing applications. In that context, "off-heap" has become almost something like a magic word to solve these problems.

+

In this blog post, we will look at how Flink exploits off-heap memory. The feature is part of the upcoming release, but you can try it out with the latest nightly builds. We will also give a few interesting insights into the behavior for Java's JIT compiler for highly optimized methods and loops.

+title: Off-heap Memory in Apache Flink and the curious JIT compiler +aliases: +- /news/2015/09/16/off-heap-memory.html +--- + +Running data-intensive code in the JVM and making it well-behaved is tricky. Systems that put billions of data objects naively onto the JVM heap face unpredictable OutOfMemoryErrors and Garbage Collection stalls. Of course, you still want to to keep your data in memory as much as possible, for speed and responsiveness of the processing applications. In that context, "off-heap" has become almost something like a magic word to solve these problems. + +In this blog post, we will look at how Flink exploits off-heap memory. The feature is part of the upcoming release, but you can try it out with the latest nightly builds. We will also give a few interesting insights into the behavior for Java's JIT compiler for highly optimized methods and loops. + + +## Recap: Memory Management in Flink + +To understand Flink’s approach to off-heap memory, we need to recap Flink’s approach to custom managed memory. We have written an [earlier blog post about how Flink manages JVM memory itself](/news/2015/05/11/Juggling-with-Bits-and-Bytes.html) + +As a summary, the core part is that Flink implements its algorithms not against Java objects, arrays, or lists, but actually against a data structure similar to `java.nio.ByteBuffer`. Flink uses its own specialized version, called [`MemorySegment`](https://github.com/apache/flink/blob/release-0.9.1-rc1/flink-core/src/main/java/org/apache/flink/core/memory/MemorySegment.java) on which algorithms put and get at specific positions ints, longs, byte arrays, etc, and compare and copy memory. The memory segments are held and distributed by a central component (called `MemoryManager`) from which algorithms request segments according to their calculated memory budgets. + +Don't believe that this can be fast? Have a look at the [benchmarks in the earlier blogpost](/news/2015/05/11/Juggling-with-Bits-and-Bytes.html), which show that it is actually often much faster than working on objects, due to better control over data layout (cache efficiency, data size), and reducing the pressure on Java's Garbage Collector. + +This form of memory management has been in Flink for a long time. Anecdotally, the first public demo of Flink's predecessor project *Stratosphere*, at the VLDB conference in 2010, was running its programs with custom managed memory (although I believe few attendees were aware of that). + + +## Why actually bother with off-heap memory? + +Given that Flink has a sophisticated level of managing on-heap memory, why do we even bother with off-heap memory? It is true that *"out of memory"* has been much less of a problem for Flink because of its heap memory management techniques. Nonetheless, there are a few good reasons to offer the possibility to move Flink's managed memory out of the JVM heap: + + - Very large JVMs (100s of GBytes heap memory) tend to be tricky. It takes long to start them (allocate and initialize heap) and garbage collection stalls can be huge (minutes). While newer incremental garbage collectors (like G1) mitigate this problem to some extend, an even better solution is to just make the heap much smaller and allocate Flink's managed memory chunks outside the heap. + + - I/O and network efficiency: In many cases, we write MemorySegments to disk (spilling) or to the network (data transfer). Off-heap memory can be written/transferred with zero copies, while heap memory always incurs an additional memory copy. + + - Off-heap memory can actually be owned by other processes. That way, cached data survives process crashes (due to user code exceptions) and can be used for recovery. Flink does not exploit that, yet, but it is interesting future work. + + +The opposite question is also valid. Why should Flink ever not use off-heap memory? + + - On-heap is easier and interplays better with tools. Some container environments and monitoring tools get confused when the monitored heap size does not remotely reflect the amount of memory used by the process. + + - Short lived memory segments are cheaper on the heap. Flink sometimes needs to allocate some short lived buffers, which works cheaper on the heap than off-heap. + + - Some operations are actually a bit faster on heap memory (or the JIT compiler understands them better). + + +## The off-heap Memory Implementation + +Given that all memory intensive internal algorithms are already implemented against the `MemorySegment`, our implementation to switch to off-heap memory is actually trivial. You can compare it to replacing all `ByteBuffer.allocate(numBytes)` calls with `ByteBuffer.allocateDirect(numBytes)`. In Flink's case it meant that we made the `MemorySegment` abstract and added the `HeapMemorySegment` and `OffHeapMemorySegment` subclasses. The `OffHeapMemorySegment` takes the off-heap memory pointer from a `java.nio.DirectByteBuffer` and implements its specialized access methods using `sun.misc.Unsafe`. We also made a few adjustments to the startup scripts and the deployment code to make sure that the JVM is permitted enough off-heap memory (direct memory, *-XX:MaxDirectMemorySize*). + +In practice we had to go one step further, to make the implementation perform well. While the `ByteBuffer` is used in I/O code paths to compose headers and move bulk memory into place, the MemorySegment is part of the innermost loops of many algorithms (sorting, hash tables, ...). That means that the access methods have to be as fast as possible. + + +## Understanding the JIT and tuning the implementation + +The `MemorySegment` was (before our change) a standalone class, it was *final* (had no subclasses). Via *Class Hierarchy Analysis (CHA)*, the JIT compiler was able to determine that all of the accessor method calls go to one specific implementation. That way, all method calls can be perfectly de-virtualized and inlined, which is essential to performance, and the basis for all further optimizations (like vectorization of the calling loop). + +With two different memory segments loaded at the same time, the JIT compiler cannot perform the same level of optimization any more, which results in a noticeable difference in performance: A slowdown of about 2.7 x in the following example: + +``` +Writing 100000 x 32768 bytes to 32768 bytes segment: + +HeapMemorySegment (standalone) : 1,441 msecs +OffHeapMemorySegment (standalone) : 1,628 msecs +HeapMemorySegment (subclass) : 3,841 msecs +OffHeapMemorySegment (subclass) : 3,847 msecs +``` + +To get back to the original performance, we explored two approaches: + +### Approach 1: Make sure that only one memory segment implementation is ever loaded. + +We re-structured the code a bit to make sure that all places that produce long-lived and short-lived memory segments instantiate the same MemorySegment subclass (Heap- or Off-Heap segment). Using factories rather than directly instantiating the memory segment classes, this was straightforward. + +Experiments (see appendix) showed that the JIT compiler properly detects this (via hierarchy analysis) and that it can perform the same level of aggressive optimization as before, when there was only one `MemorySegment` class. + + +### Approach 2: Write one segment that handles both heap and off-heap memory + +We created a class `HybridMemorySegment` which handles transparently both heap- and off-heap memory. It can be initialized either with a byte array (heap memory), or with a pointer to a memory region outside the heap (off-heap memory). + +Fortunately, there is a nice trick to do this without introducing code branches and specialized handling of the two different memory types. The trick is based on the way that the `sun.misc.Unsafe` methods interpret object references. To illustrate this, we take the method that gets a long integer from a memory position: + +``` +sun.misc.Unsafe.getLong(Object reference, long offset) +``` + +The method accepts an object reference, takes its memory address, and add the offset to obtain a pointer. It then fetches the eight bytes at the address pointed to and interprets them as a long integer. Since the method accepts *null* as the reference (and interprets it a *zero*) one can write a method that fetches a long integer seamlessly from heap and off-heap memory as follows: + +```java +public class HybridMemorySegment { + + private final byte[] heapMemory; // non-null in heap case, null in off-heap case + private final long address; // may be absolute, or relative to byte[] + + + // method of interest + public long getLong(int pos) { + return UNSAFE.getLong(heapMemory, address + pos); + } + + + // initialize for heap memory + public HybridMemorySegment(byte[] heapMemory) { + this.heapMemory = heapMemory; + this.address = UNSAFE.arrayBaseOffset(byte[].class) + } + + // initialize for off-heap memory + public HybridMemorySegment(long offheapPointer) { + this.heapMemory = null; + this.address = offheapPointer + } +} +``` + +To check whether both cases (heap and off-heap) really result in the same code paths (no hidden branches inside the `Unsafe.getLong(Object, long)` method) one can check out the C++ source code of `sun.misc.Unsafe`, available here: [http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/prims/unsafe.cpp](http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/prims/unsafe.cpp) + +Of particular interest is the macro in line 155, which is the base of all GET methods. Tracing the function calls (many are no-ops), one can see that both variants of Unsafe’s `getLong()` result in the same code: +Either `0 + absolutePointer` or `objectRefAddress + offset`. + + +## Summary + +We ended up choosing a combination of both techniques: + + - For off-heap memory, we use the `HybridMemorySegment` from approach (2) which can represent both heap and off-heap memory. That way, the same class represents the long-lived off-heap memory as the short-lived temporary buffers allocated (or wrapped) on the heap. + + - We follow approach (1) to use factories to make sure that one segment is ever only loaded, which gives peak performance. We can exploit the performance benefits of the `HeapMemorySegment` on individual byte operations, and we have a mechanism in place to add further implementations of `MemorySegments` for the case that Oracle really removes `sun.misc.Unsafe` in future Java versions. + +The final code can be found in the Flink repository, under [https://github.com/apache/flink/tree/master/flink-core/src/main/java/org/apache/flink/core/memory](https://github.com/apache/flink/tree/master/flink-core/src/main/java/org/apache/flink/core/memory) + +Detailed micro benchmarks are in the appendix. A summary of the findings is as follows: + + - The `HybridMemorySegment` performs equally well in heap and off-heap memory, as is to be expected (the code paths are the same) + + - The `HeapMemorySegment` is quite a bit faster in reading individual bytes, not so much at writing them. Access to a *byte[]* is after all a bit cheaper than an invocation of a `sun.misc.Unsafe` method, even when JIT-ed. + + - The abstract class `MemorySegment` (with its subclasses `HeapMemorySegment` and `HybridMemorySegment`) performs as well as any specialized non-abstract class, as long as only one subclass is loaded. When both are loaded, performance may suffer by a factor of 2.7 x on certain operations. + + - How badly the performance degrades in cases where both MemorySegment subclasses are loaded seems to depend a lot on which subclass is loaded and operated on before and after which. Sometimes, performance is affected more than other times. It seems to be an artifact of the JIT’s code profiling and how heavily it performs optimistic specialization towards certain subclasses. + + +There is still a bit of mystery left, specifically why sometimes code is faster when it performs more checks (has more instructions and an additional branch). Even though the branch is perfectly predictable, this seems counter-intuitive. The only explanation that we could come up with is that the branch optimizations (such as optimistic elimination etc) result in code that does better register allocation (for whatever reason, maybe the intermediate instructions just fit the allocation algorithm better). + +## tl;dr + + - Off-heap memory in Flink complements the already very fast on-heap memory management. It improves the scalability to very large heap sizes and reduces memory copies for network and disk I/O. + + - Flink’s already present memory management infrastructure made the addition of off-heap memory simple. Off-heap memory is not only used for caching data, Flink can actually sort data off-heap and build hash tables off-heap. + + - We play a few nice tricks in the implementation to make sure the code is as friendly as possible to the JIT compiler and processor, to make the managed memory accesses are as fast as possible. + + - Understanding the JVM’s JIT compiler is tough - one needs a lot of (randomized) micro benchmarking to examine its behavior. + +-------- + +## Appendix: Detailed Micro Benchmarks + +These microbenchmarks test the performance of the different memory segment implementations on various operation. + +Each experiments tests the different implementations multiple times in different orders, to balance the advantage/disadvantage of the JIT compiler specializing towards certain code paths. All experiments were run 5x, discarding the fastest and slowest run, and then averaged. This compensated for delay before the JIT kicks in. + +My setup: + + - Oracle Java 8 (1.8.0_25) + - 4 GBytes JVM heap (the experiments need 1.4 GBytes Heap + 1 GBytes direct memory) + - Intel Core i7-4700MQ CPU, 2.40GHz (4 cores, 8 hardware contexts) + +The tested implementations are + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
TypeDescription
HeapMemorySegment (exclusive)The case where it is the only loaded MemorySegment subclass.
HeapMemorySegment (mixed)The case where both the HeapMemorySegment and the HybridMemorySegment are loaded.
HybridMemorySegment (heap-exclusive)Backed by heap memory, and the case where it is the only loaded MemorySegment class.
HybridMemorySegment (heap-mixed)Backed by heap memory, and the case where both the HeapMemorySegment and the HybridMemorySegment are loaded.
HybridMemorySegment (off-heap-exclusive)Backed by off-heap memory, and the case where it is the only loaded MemorySegment class.
HybridMemorySegment (off-heap-mixed)Backed by heap off-memory, and the case where both the HeapMemorySegment and the HybridMemorySegment are loaded.
PureHeapSegmentHas no class hierarchy and virtual methods at all.
PureHybridSegment (heap)Has no class hierarchy and virtual methods at all, backed by heap memory.
PureHybridSegment (off-heap)Has no class hierarchy and virtual methods at all, backed by off-heap memory.
+ +
+

Byte accesses

+ +

Writing 100000 x 32768 bytes to 32768 bytes segment

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SegmentTime
HeapMemorySegment, exclusive1,441 msecs
HeapMemorySegment, mixed3,841 msecs
HybridMemorySegment, heap, exclusive1,626 msecs
HybridMemorySegment, off-heap, exclusive1,628 msecs
HybridMemorySegment, heap, mixed3,848 msecs
HybridMemorySegment, off-heap, mixed3,847 msecs
PureHeapSegment1,442 msecs
PureHybridSegment, heap1,623 msecs
PureHybridSegment, off-heap1,620 msecs
+ +

Reading 100000 x 32768 bytes from 32768 bytes segment

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SegmentTime
HeapMemorySegment, exclusive1,326 msecs
HeapMemorySegment, mixed1,378 msecs
HybridMemorySegment, heap, exclusive2,029 msecs
HybridMemorySegment, off-heap, exclusive2,030 msecs
HybridMemorySegment, heap, mixed2,047 msecs
HybridMemorySegment, off-heap, mixed2,049 msecs
PureHeapSegment1,331 msecs
PureHybridSegment, heap2,030 msecs
PureHybridSegment, off-heap2,030 msecs
+ +

Writing 10 x 1073741824 bytes to 1073741824 bytes segment

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SegmentTime
HeapMemorySegment, exclusive5,602 msecs
HeapMemorySegment, mixed12,570 msecs
HybridMemorySegment, heap, exclusive5,691 msecs
HybridMemorySegment, off-heap, exclusive5,691 msecs
HybridMemorySegment, heap, mixed12,566 msecs
HybridMemorySegment, off-heap, mixed12,556 msecs
PureHeapSegment5,599 msecs
PureHybridSegment, heap5,687 msecs
PureHybridSegment, off-heap5,681 msecs
+ +

Reading 10 x 1073741824 bytes from 1073741824 bytes segment

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SegmentTime
HeapMemorySegment, exclusive4,243 msecs
HeapMemorySegment, mixed4,265 msecs
HybridMemorySegment, heap, exclusive6,730 msecs
HybridMemorySegment, off-heap, exclusive6,725 msecs
HybridMemorySegment, heap, mixed6,933 msecs
HybridMemorySegment, off-heap, mixed6,926 msecs
PureHeapSegment4,247 msecs
PureHybridSegment, heap6,919 msecs
PureHybridSegment, off-heap6,916 msecs
+ +

Byte Array accesses

+ +

Writing 100000 x 32 byte[1024] to 32768 bytes segment

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SegmentTime
HeapMemorySegment, mixed164 msecs
HybridMemorySegment, heap, mixed163 msecs
HybridMemorySegment, off-heap, mixed163 msecs
PureHeapSegment165 msecs
PureHybridSegment, heap182 msecs
PureHybridSegment, off-heap176 msecs
+ +

Reading 100000 x 32 byte[1024] from 32768 bytes segment

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SegmentTime
HeapMemorySegment, mixed157 msecs
HybridMemorySegment, heap, mixed155 msecs
HybridMemorySegment, off-heap, mixed162 msecs
PureHeapSegment161 msecs
PureHybridSegment, heap175 msecs
PureHybridSegment, off-heap179 msecs
+ +

Writing 10 x 1048576 byte[1024] to 1073741824 bytes segment

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SegmentTime
HeapMemorySegment, mixed1,164 msecs
HybridMemorySegment, heap, mixed1,173 msecs
HybridMemorySegment, off-heap, mixed1,157 msecs
PureHeapSegment1,169 msecs
PureHybridSegment, heap1,174 msecs
PureHybridSegment, off-heap1,166 msecs
+ +

Reading 10 x 1048576 byte[1024] from 1073741824 bytes segment

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SegmentTime
HeapMemorySegment, mixed854 msecs
HybridMemorySegment, heap, mixed853 msecs
HybridMemorySegment, off-heap, mixed854 msecs
PureHeapSegment857 msecs
PureHybridSegment, heap896 msecs
PureHybridSegment, off-heap887 msecs
+ +

Long integer accesses

+ +

(note that the heap and off-heap segments use the same or comparable code for this)

+ +

Writing 100000 x 4096 longs to 32768 bytes segment

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SegmentTime
HeapMemorySegment, mixed221 msecs
HybridMemorySegment, heap, mixed222 msecs
HybridMemorySegment, off-heap, mixed221 msecs
PureHeapSegment194 msecs
PureHybridSegment, heap220 msecs
PureHybridSegment, off-heap221 msecs
+ +

Reading 100000 x 4096 longs from 32768 bytes segment

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SegmentTime
HeapMemorySegment, mixed233 msecs
HybridMemorySegment, heap, mixed232 msecs
HybridMemorySegment, off-heap, mixed231 msecs
PureHeapSegment232 msecs
PureHybridSegment, heap232 msecs
PureHybridSegment, off-heap233 msecs
+ +

Writing 10 x 134217728 longs to 1073741824 bytes segment

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SegmentTime
HeapMemorySegment, mixed1,120 msecs
HybridMemorySegment, heap, mixed1,120 msecs
HybridMemorySegment, off-heap, mixed1,115 msecs
PureHeapSegment1,148 msecs
PureHybridSegment, heap1,116 msecs
PureHybridSegment, off-heap1,113 msecs
+ +

Reading 10 x 134217728 longs from 1073741824 bytes segment

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SegmentTime
HeapMemorySegment, mixed1,097 msecs
HybridMemorySegment, heap, mixed1,099 msecs
HybridMemorySegment, off-heap, mixed1,093 msecs
PureHeapSegment917 msecs
PureHybridSegment, heap1,105 msecs
PureHybridSegment, off-heap1,097 msecs
+ +

Integer accesses

+ +

(note that the heap and off-heap segments use the same or comparable code for this)

+ +

Writing 100000 x 8192 ints to 32768 bytes segment

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SegmentTime
HeapMemorySegment, mixed578 msecs
HybridMemorySegment, heap, mixed580 msecs
HybridMemorySegment, off-heap, mixed576 msecs
PureHeapSegment624 msecs
PureHybridSegment, heap576 msecs
PureHybridSegment, off-heap578 msecs
+ +

Reading 100000 x 8192 ints from 32768 bytes segment

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SegmentTime
HeapMemorySegment, mixed464 msecs
HybridMemorySegment, heap, mixed464 msecs
HybridMemorySegment, off-heap, mixed465 msecs
PureHeapSegment463 msecs
PureHybridSegment, heap464 msecs
PureHybridSegment, off-heap463 msecs
+ +

Writing 10 x 268435456 ints to 1073741824 bytes segment

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SegmentTime
HeapMemorySegment, mixed2,187 msecs
HybridMemorySegment, heap, mixed2,161 msecs
HybridMemorySegment, off-heap, mixed2,152 msecs
PureHeapSegment2,770 msecs
PureHybridSegment, heap2,161 msecs
PureHybridSegment, off-heap2,157 msecs
+ +

Reading 10 x 268435456 ints from 1073741824 bytes segment

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SegmentTime
HeapMemorySegment, mixed1,782 msecs
HybridMemorySegment, heap, mixed1,783 msecs
HybridMemorySegment, off-heap, mixed1,774 msecs
PureHeapSegment1,501 msecs
PureHybridSegment, heap1,774 msecs
PureHybridSegment, off-heap1,771 msecs
+
+ + diff --git a/docs/content.tr/posts/2015-11-16-release-0.10.0.md b/docs/content.tr/posts/2015-11-16-release-0.10.0.md new file mode 100644 index 0000000000..9339bd496f --- /dev/null +++ b/docs/content.tr/posts/2015-11-16-release-0.10.0.md @@ -0,0 +1,170 @@ +--- +date: "2015-11-16T08:00:00Z" +title: Announcing Apache Flink 0.10.0 +aliases: +- /news/2015/11/16/release-0.10.0.html +--- + +The Apache Flink community is pleased to announce the availability of the 0.10.0 release. The community put significant effort into improving and extending Apache Flink since the last release, focusing on data stream processing and operational features. About 80 contributors provided bug fixes, improvements, and new features such that in total more than 400 JIRA issues could be resolved. + +For Flink 0.10.0, the focus of the community was to graduate the DataStream API from beta and to evolve Apache Flink into a production-ready stream data processor with a competitive feature set. These efforts resulted in support for event-time and out-of-order streams, exactly-once guarantees in the case of failures, a very flexible windowing mechanism, sophisticated operator state management, and a highly-available cluster operation mode. Flink 0.10.0 also brings a new monitoring dashboard with real-time system and job monitoring capabilities. Both batch and streaming modes of Flink benefit from the new high availability and improved monitoring features. Needless to say that Flink 0.10.0 includes many more features, improvements, and bug fixes. + +We encourage everyone to [download the release](/downloads.html) and [check out the documentation]({{< param DocsBaseUrl >}}flink-docs-release-0.10/). Feedback through the Flink [mailing lists](/community.html#mailing-lists) is, as always, very welcome! + +## New Features + +### Event-time Stream Processing + +Many stream processing applications consume data from sources that produce events with associated timestamps such as sensor or user-interaction events. Very often, events have to be collected from several sources such that it is usually not guaranteed that events arrive in the exact order of their timestamps at the stream processor. Consequently, stream processors must take out-of-order elements into account in order to produce results which are correct and consistent with respect to the timestamps of the events. With release 0.10.0, Apache Flink supports event-time processing as well as ingestion-time and processing-time processing. See [FLINK-2674](https://issues.apache.org/jira/browse/FLINK-2674) for details. + +### Stateful Stream Processing + +Operators that maintain and update state are a common pattern in many stream processing applications. Since streaming applications tend to run for a very long time, operator state can become very valuable and impossible to recompute. In order to enable fault-tolerance, operator state must be backed up to persistent storage in regular intervals. Flink 0.10.0 offers flexible interfaces to define, update, and query operator state and hooks to connect various state backends. + +### Highly-available Cluster Operations + +Stream processing applications may be live for months. Therefore, a production-ready stream processor must be highly-available and continue to process data even in the face of failures. With release 0.10.0, Flink supports high availability modes for standalone cluster and [YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) setups, eliminating any single point of failure. In this mode, Flink relies on [Apache Zookeeper](https://zookeeper.apache.org) for leader election and persisting small sized meta-data of running jobs. You can [check out the documentation]({{< param DocsBaseUrl >}}flink-docs-release-0.10/setup/jobmanager_high_availability.html) to see how to enable high availability. See [FLINK-2287](https://issues.apache.org/jira/browse/FLINK-2287) for details. + +### Graduated DataStream API + +The DataStream API was revised based on user feedback and with foresight for upcoming features and graduated from beta status to fully supported. The most obvious changes are related to the methods for stream partitioning and window operations. The new windowing system is based on the concepts of window assigners, triggers, and evictors, inspired by the [Dataflow Model](http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf). The new API is fully described in the [DataStream API documentation]({{< param DocsBaseUrl >}}flink-docs-release-0.10/apis/streaming_guide.html). This [migration guide](https://cwiki.apache.org/confluence/display/FLINK/Migration+Guide%3A+0.9.x+to+0.10.x) will help to port your Flink 0.9 DataStream programs to the revised API of Flink 0.10.0. See [FLINK-2674](https://issues.apache.org/jira/browse/FLINK-2674) and [FLINK-2877](https://issues.apache.org/jira/browse/FLINK-2877) for details. + +### New Connectors for Data Streams + +Apache Flink 0.10.0 features DataStream sources and sinks for many common data producers and stores. This includes an exactly-once rolling file sink which supports any file system, including HDFS, local FS, and S3. We also updated the [Apache Kafka](https://kafka.apache.org) producer to use the new producer API, and added a connectors for [ElasticSearch](https://github.com/elastic/elasticsearch) and [Apache Nifi](https://nifi.apache.org). More connectors for DataStream programs will be added by the community in the future. See the following JIRA issues for details [FLINK-2583](https://issues.apache.org/jira/browse/FLINK-2583), [FLINK-2386](https://issues.apache.org/jira/browse/FLINK-2386), [FLINK-2372](https://issues.apache.org/jira/browse/FLINK-2372), [FLINK-2740](https://issues.apache.org/jira/browse/FLINK-2740), and [FLINK-2558](https://issues.apache.org/jira/browse/FLINK-2558). + +### New Web Dashboard & Real-time Monitoring + +The 0.10.0 release features a newly designed and significantly improved monitoring dashboard for Apache Flink. The new dashboard visualizes the progress of running jobs and shows real-time statistics of processed data volumes and record counts. Moreover, it gives access to resource usage and JVM statistics of TaskManagers including JVM heap usage and garbage collection details. The following screenshot shows the job view of the new dashboard. + +
+ +
+ +The web server that provides all monitoring statistics has been designed with a REST interface allowing other systems to also access the internal system metrics. See [FLINK-2357](https://issues.apache.org/jira/browse/FLINK-2357) for details. + +### Off-heap Managed Memory + +Flink’s internal operators (such as its sort algorithm and hash tables) write data to and read data from managed memory to achieve memory-safe operations and reduce garbage collection overhead. Until version 0.10.0, managed memory was allocated only from JVM heap memory. With this release, managed memory can also be allocated from off-heap memory. This will facilitate shorter TaskManager start-up times as well as reduce garbage collection pressure. See [the documentation]({{< param DocsBaseUrl >}}flink-docs-release-0.10/setup/config.html#managed-memory) to learn how to configure managed memory on off-heap memory. JIRA issue [FLINK-1320](https://issues.apache.org/jira/browse/FLINK-1320) contains further details. + +### Outer Joins + +Outer joins have been one of the most frequently requested features for Flink’s [DataSet API]({{< param DocsBaseUrl >}}flink-docs-release-0.10/apis/programming_guide.html). Although there was a workaround to implement outer joins as CoGroup function, it had significant drawbacks including added code complexity and not being fully memory-safe. With release 0.10.0, Flink adds native support for [left, right, and full outer joins]({{< param DocsBaseUrl >}}flink-docs-release-0.10/apis/dataset_transformations.html#outerjoin) to the DataSet API. All outer joins are backed by a memory-safe operator implementation that leverages Flink’s managed memory. See [FLINK-687](https://issues.apache.org/jira/browse/FLINK-687) and [FLINK-2107](https://issues.apache.org/jira/browse/FLINK-2107) for details. + +### Gelly: Major Improvements and Scala API + +[Gelly]({{< param DocsBaseUrl >}}flink-docs-release-0.10/libs/gelly_guide.html) is Flink’s API and library for processing and analyzing large-scale graphs. Gelly was introduced with release 0.9.0 and has been very well received by users and contributors. Based on user feedback, Gelly has been improved since then. In addition, Flink 0.10.0 introduces a Scala API for Gelly. See [FLINK-2857](https://issues.apache.org/jira/browse/FLINK-2857) and [FLINK-1962](https://issues.apache.org/jira/browse/FLINK-1962) for details. + +## More Improvements and Fixes + +The Flink community resolved more than 400 issues. The following list is a selection of new features and fixed bugs. + +- [FLINK-1851](https://issues.apache.org/jira/browse/FLINK-1851) Java Table API does not support Casting +- [FLINK-2152](https://issues.apache.org/jira/browse/FLINK-2152) Provide zipWithIndex utility in flink-contrib +- [FLINK-2158](https://issues.apache.org/jira/browse/FLINK-2158) NullPointerException in DateSerializer. +- [FLINK-2240](https://issues.apache.org/jira/browse/FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join +- [FLINK-2533](https://issues.apache.org/jira/browse/FLINK-2533) Gap based random sample optimization +- [FLINK-2555](https://issues.apache.org/jira/browse/FLINK-2555) Hadoop Input/Output Formats are unable to access secured HDFS clusters +- [FLINK-2565](https://issues.apache.org/jira/browse/FLINK-2565) Support primitive arrays as keys +- [FLINK-2582](https://issues.apache.org/jira/browse/FLINK-2582) Document how to build Flink with other Scala versions +- [FLINK-2584](https://issues.apache.org/jira/browse/FLINK-2584) ASM dependency is not shaded away +- [FLINK-2689](https://issues.apache.org/jira/browse/FLINK-2689) Reusing null object for joins with SolutionSet +- [FLINK-2703](https://issues.apache.org/jira/browse/FLINK-2703) Remove log4j classes from fat jar / document how to use Flink with logback +- [FLINK-2763](https://issues.apache.org/jira/browse/FLINK-2763) Bug in Hybrid Hash Join: Request to spill a partition with less than two buffers. +- [FLINK-2767](https://issues.apache.org/jira/browse/FLINK-2767) Add support Scala 2.11 to Scala shell +- [FLINK-2774](https://issues.apache.org/jira/browse/FLINK-2774) Import Java API classes automatically in Flink's Scala shell +- [FLINK-2782](https://issues.apache.org/jira/browse/FLINK-2782) Remove deprecated features for 0.10 +- [FLINK-2800](https://issues.apache.org/jira/browse/FLINK-2800) kryo serialization problem +- [FLINK-2834](https://issues.apache.org/jira/browse/FLINK-2834) Global round-robin for temporary directories +- [FLINK-2842](https://issues.apache.org/jira/browse/FLINK-2842) S3FileSystem is broken +- [FLINK-2874](https://issues.apache.org/jira/browse/FLINK-2874) Certain Avro generated getters/setters not recognized +- [FLINK-2895](https://issues.apache.org/jira/browse/FLINK-2895) Duplicate immutable object creation +- [FLINK-2964](https://issues.apache.org/jira/browse/FLINK-2964) MutableHashTable fails when spilling partitions without overflow segments + +## Notice + +As previously announced, Flink 0.10.0 no longer supports Java 6. If you are still using Java 6, please consider upgrading to Java 8 (Java 7 ended its free support in April 2015). +Also note that some methods in the DataStream API had to be renamed as part of the API rework. For example the `groupBy` method has been renamed to `keyBy` and the windowing API changed. This [migration guide](https://cwiki.apache.org/confluence/display/FLINK/Migration+Guide%3A+0.9.x+to+0.10.x) will help to port your Flink 0.9 DataStream programs to the revised API of Flink 0.10.0. + +## Contributors + +- Alexander Alexandrov +- Marton Balassi +- Enrique Bautista +- Faye Beligianni +- Bryan Bende +- Ajay Bhat +- Chris Brinkman +- Dmitry Buzdin +- Kun Cao +- Paris Carbone +- Ufuk Celebi +- Shivani Chandna +- Liang Chen +- Felix Cheung +- Hubert Czerpak +- Vimal Das +- Behrouz Derakhshan +- Suminda Dharmasena +- Stephan Ewen +- Fengbin Fang +- Gyula Fora +- Lun Gao +- Gabor Gevay +- Piotr Godek +- Sachin Goel +- Anton Haglund +- Gábor Hermann +- Greg Hogan +- Fabian Hueske +- Martin Junghanns +- Vasia Kalavri +- Ulf Karlsson +- Frederick F. Kautz +- Samia Khalid +- Johannes Kirschnick +- Kostas Kloudas +- Alexander Kolb +- Johann Kovacs +- Aljoscha Krettek +- Sebastian Kruse +- Andreas Kunft +- Chengxiang Li +- Chen Liang +- Andra Lungu +- Suneel Marthi +- Tamara Mendt +- Robert Metzger +- Maximilian Michels +- Chiwan Park +- Sahitya Pavurala +- Pietro Pinoli +- Ricky Pogalz +- Niraj Rai +- Lokesh Rajaram +- Johannes Reifferscheid +- Till Rohrmann +- Henry Saputra +- Matthias Sax +- Shiti Saxena +- Chesnay Schepler +- Peter Schrott +- Saumitra Shahapure +- Nikolaas Steenbergen +- Thomas Sun +- Peter Szabo +- Viktor Taranenko +- Kostas Tzoumas +- Pieter-Jan Van Aeken +- Theodore Vasiloudis +- Timo Walther +- Chengxuan Wang +- Huang Wei +- Dawid Wysakowicz +- Rerngvit Yanggratoke +- Nezih Yigitbasi +- Ted Yu +- Rucong Zhang +- Vyacheslav Zholudev +- Zoltán Zvara + diff --git a/docs/content.tr/posts/2015-11-27-release-0.10.1.md b/docs/content.tr/posts/2015-11-27-release-0.10.1.md new file mode 100644 index 0000000000..7116ba44eb --- /dev/null +++ b/docs/content.tr/posts/2015-11-27-release-0.10.1.md @@ -0,0 +1,58 @@ +--- +date: "2015-11-27T08:00:00Z" +title: Flink 0.10.1 released +aliases: +- /news/2015/11/27/release-0.10.1.html +--- + +Today, the Flink community released the first bugfix release of the 0.10 series of Flink. + +We recommend all users updating to this release, by bumping the version of your Flink dependencies and updating the binaries on the server. + +## Issues fixed + +
    +
  • [FLINK-2879] - Links in documentation are broken +
  • +
  • [FLINK-2938] - Streaming docs not in sync with latest state changes +
  • +
  • [FLINK-2942] - Dangling operators in web UI's program visualization (non-deterministic) +
  • +
  • [FLINK-2967] - TM address detection might not always detect the right interface on slow networks / overloaded JMs +
  • +
  • [FLINK-2977] - Cannot access HBase in a Kerberos secured Yarn cluster +
  • +
  • [FLINK-2987] - Flink 0.10 fails to start on YARN 2.6.0 +
  • +
  • [FLINK-2989] - Job Cancel button doesn't work on Yarn +
  • +
  • [FLINK-3005] - Commons-collections object deserialization remote command execution vulnerability +
  • +
  • [FLINK-3011] - Cannot cancel failing/restarting streaming job from the command line +
  • +
  • [FLINK-3019] - CLI does not list running/restarting jobs +
  • +
  • [FLINK-3020] - Local streaming execution: set number of task manager slots to the maximum parallelism +
  • +
  • [FLINK-3024] - TimestampExtractor Does not Work When returning Long.MIN_VALUE +
  • +
  • [FLINK-3032] - Flink does not start on Hadoop 2.7.1 (HDP), due to class conflict +
  • +
  • [FLINK-3043] - Kafka Connector description in Streaming API guide is wrong/outdated +
  • +
  • [FLINK-3047] - Local batch execution: set number of task manager slots to the maximum parallelism +
  • +
  • [FLINK-3052] - Optimizer does not push properties out of bulk iterations +
  • +
  • [FLINK-2966] - Improve the way job duration is reported on web frontend. +
  • +
  • [FLINK-2974] - Add periodic offset commit to Kafka Consumer if checkpointing is disabled +
  • +
  • [FLINK-3028] - Cannot cancel restarting job via web frontend +
  • +
  • [FLINK-3040] - Add docs describing how to configure State Backends +
  • +
  • [FLINK-3041] - Twitter Streaming Description section of Streaming Programming guide refers to an incorrect example 'TwitterLocal' +
  • +
+ diff --git a/docs/content.tr/posts/2015-12-04-Introducing-windows.md b/docs/content.tr/posts/2015-12-04-Introducing-windows.md new file mode 100644 index 0000000000..536928053c --- /dev/null +++ b/docs/content.tr/posts/2015-12-04-Introducing-windows.md @@ -0,0 +1,175 @@ +--- +author: Fabian Hueske +author-twitter: fhueske +date: "2015-12-04T10:00:00Z" +excerpt: |- +

The data analysis space is witnessing an evolution from batch to stream processing for many use cases. Although batch can be handled as a special case of stream processing, analyzing never-ending streaming data often requires a shift in the mindset and comes with its own terminology (for example, “windowing” and “at-least-once”/”exactly-once” processing). This shift and the new terminology can be quite confusing for people being new to the space of stream processing. Apache Flink is a production-ready stream processor with an easy-to-use yet very expressive API to define advanced stream analysis programs. Flink's API features very flexible window definitions on data streams which let it stand out among other open source stream processors.

+

In this blog post, we discuss the concept of windows for stream processing, present Flink's built-in windows, and explain its support for custom windowing semantics.

+title: Introducing Stream Windows in Apache Flink +aliases: +- /news/2015/12/04/Introducing-windows.html +--- + +The data analysis space is witnessing an evolution from batch to stream processing for many use cases. Although batch can be handled as a special case of stream processing, analyzing never-ending streaming data often requires a shift in the mindset and comes with its own terminology (for example, “windowing” and “at-least-once”/”exactly-once” processing). This shift and the new terminology can be quite confusing for people being new to the space of stream processing. Apache Flink is a production-ready stream processor with an easy-to-use yet very expressive API to define advanced stream analysis programs. Flink's API features very flexible window definitions on data streams which let it stand out among other open source stream processors. + +In this blog post, we discuss the concept of windows for stream processing, present Flink's built-in windows, and explain its support for custom windowing semantics. + +## What are windows and what are they good for? + +Consider the example of a traffic sensor that counts every 15 seconds the number of vehicles passing a certain location. The resulting stream could look like: + +
+ +
+ +If you would like to know, how many vehicles passed that location, you would simply sum the individual counts. However, the nature of a sensor stream is that it continuously produces data. Such a stream never ends and it is not possible to compute a final sum that can be returned. Instead, it is possible to compute rolling sums, i.e., return for each input event an updated sum record. This would yield a new stream of partial sums. + +
+ +
+ +However, a stream of partial sums might not be what we are looking for, because it constantly updates the count and even more important, some information such as variation over time is lost. Hence, we might want to rephrase our question and ask for the number of cars that pass the location every minute. This requires us to group the elements of the stream into finite sets, each set corresponding to sixty seconds. This operation is called a *tumbling windows* operation. + +
+ +
+ +Tumbling windows discretize a stream into non-overlapping windows. For certain applications it is important that windows are not disjunct because an application might require smoothed aggregates. For example, we can compute every thirty seconds the number of cars passed in the last minute. Such windows are called *sliding windows*. + +
+ +
+ +Defining windows on a data stream as discussed before is a non-parallel operation. This is because each element of a stream must be processed by the same window operator that decides which windows the element should be added to. Windows on a full stream are called *AllWindows* in Flink. For many applications, a data stream needs to be grouped into multiple logical streams on each of which a window operator can be applied. Think for example about a stream of vehicle counts from multiple traffic sensors (instead of only one sensor as in our previous example), where each sensor monitors a different location. By grouping the stream by sensor id, we can compute windowed traffic statistics for each location in parallel. In Flink, we call such partitioned windows simply *Windows*, as they are the common case for distributed streams. The following figure shows tumbling windows that collect two elements over a stream of `(sensorId, count)` pair elements. + +
+ +
+ +Generally speaking, a window defines a finite set of elements on an unbounded stream. This set can be based on time (as in our previous examples), element counts, a combination of counts and time, or some custom logic to assign elements to windows. Flink's DataStream API provides concise operators for the most common window operations as well as a generic windowing mechanism that allows users to define very custom windowing logic. In the following we present Flink's time and count windows before discussing its windowing mechanism in detail. + +## Time Windows + +As their name suggests, time windows group stream elements by time. For example, a tumbling time window of one minute collects elements for one minute and applies a function on all elements in the window after one minute passed. + +Defining tumbling and sliding time windows in Apache Flink is very easy: + + +```scala +// Stream of (sensorId, carCnt) +val vehicleCnts: DataStream[(Int, Int)] = ... + +val tumblingCnts: DataStream[(Int, Int)] = vehicleCnts + // key stream by sensorId + .keyBy(0) + // tumbling time window of 1 minute length + .timeWindow(Time.minutes(1)) + // compute sum over carCnt + .sum(1) + +val slidingCnts: DataStream[(Int, Int)] = vehicleCnts + .keyBy(0) + // sliding time window of 1 minute length and 30 secs trigger interval + .timeWindow(Time.minutes(1), Time.seconds(30)) + .sum(1) +``` + + + There is one aspect that we haven't discussed yet, namely the exact meaning of "*collects elements for one minute*" which boils down to the question, "*How does the stream processor interpret time?*". + +Apache Flink features three different notions of time, namely *processing time*, *event time*, and *ingestion time*. + +1. In **processing time**, windows are defined with respect to the wall clock of the machine that builds and processes a window, i.e., a one minute processing time window collects elements for exactly one minute. +1. In **event time**, windows are defined with respect to timestamps that are attached to each event record. This is common for many types of events, such as log entries, sensor data, etc, where the timestamp usually represents the time at which the event occurred. Event time has several benefits over processing time. First of all, it decouples the program semantics from the actual serving speed of the source and the processing performance of system. Hence you can process historic data, which is served at maximum speed, and continuously produced data with the same program. It also prevents semantically incorrect results in case of backpressure or delays due to failure recovery. Second, event time windows compute correct results, even if events arrive out-of-order of their timestamp which is common if a data stream gathers events from distributed sources. +1. **Ingestion time** is a hybrid of processing and event time. It assigns wall clock timestamps to records as soon as they arrive in the system (at the source) and continues processing with event time semantics based on the attached timestamps. + +## Count Windows + +Apache Flink also features count windows. A tumbling count window of 100 will collect 100 events in a window and evaluate the window when the 100th element has been added. + +In Flink's DataStream API, tumbling and sliding count windows are defined as follows: + +```scala +// Stream of (sensorId, carCnt) +val vehicleCnts: DataStream[(Int, Int)] = ... + +val tumblingCnts: DataStream[(Int, Int)] = vehicleCnts + // key stream by sensorId + .keyBy(0) + // tumbling count window of 100 elements size + .countWindow(100) + // compute the carCnt sum + .sum(1) + +val slidingCnts: DataStream[(Int, Int)] = vehicleCnts + .keyBy(0) + // sliding count window of 100 elements size and 10 elements trigger interval + .countWindow(100, 10) + .sum(1) +``` + +## Dissecting Flink's windowing mechanics + +Flink's built-in time and count windows cover a wide range of common window use cases. However, there are of course applications that require custom windowing logic that cannot be addressed by Flink's built-in windows. In order to support also applications that need very specific windowing semantics, the DataStream API exposes interfaces for the internals of its windowing mechanics. These interfaces give very fine-grained control about the way that windows are built and evaluated. + +The following figure depicts Flink's windowing mechanism and introduces the components being involved. + +
+ +
+ +Elements that arrive at a window operator are handed to a `WindowAssigner`. The WindowAssigner assigns elements to one or more windows, possibly creating new windows. A `Window` itself is just an identifier for a list of elements and may provide some optional meta information, such as begin and end time in case of a `TimeWindow`. Note that an element can be added to multiple windows, which also means that multiple windows can exist at the same time. + +Each window owns a `Trigger` that decides when the window is evaluated or purged. The trigger is called for each element that is inserted into the window and when a previously registered timer times out. On each event, a trigger can decide to fire (i.e., evaluate), purge (remove the window and discard its content), or fire and then purge the window. A trigger that just fires evaluates the window and keeps it as it is, i.e., all elements remain in the window and are evaluated again when the triggers fires the next time. A window can be evaluated several times and exists until it is purged. Note that a window consumes memory until it is purged. + +When a Trigger fires, the list of window elements can be given to an optional `Evictor`. The evictor can iterate through the list and decide to cut off some elements from the start of the list, i.e., remove some of the elements that entered the window first. The remaining elements are given to an evaluation function. If no Evictor was defined, the Trigger hands all the window elements directly to the evaluation function. + +The evaluation function receives the elements of a window (possibly filtered by an Evictor) and computes one or more result elements for the window. The DataStream API accepts different types of evaluation functions, including predefined aggregation functions such as `sum()`, `min()`, `max()`, as well as a `ReduceFunction`, `FoldFunction`, or `WindowFunction`. A WindowFunction is the most generic evaluation function and receives the window object (i.e, the metadata of the window), the list of window elements, and the window key (in case of a keyed window) as parameters. + +These are the components that constitute Flink's windowing mechanics. We now show step-by-step how to implement custom windowing logic with the DataStream API. We start with a stream of type `DataStream[IN]` and key it using a key selector function that extracts a key of type `KEY` to obtain a `KeyedStream[IN, KEY]`. + +```scala +val input: DataStream[IN] = ... + +// created a keyed stream using a key selector function +val keyed: KeyedStream[IN, KEY] = input + .keyBy(myKeySel: (IN) => KEY) +``` + +We apply a `WindowAssigner[IN, WINDOW]` that creates windows of type `WINDOW` resulting in a `WindowedStream[IN, KEY, WINDOW]`. In addition, a `WindowAssigner` also provides a default `Trigger` implementation. + +```scala +// create windowed stream using a WindowAssigner +var windowed: WindowedStream[IN, KEY, WINDOW] = keyed + .window(myAssigner: WindowAssigner[IN, WINDOW]) +``` + +We can explicitly specify a `Trigger` to overwrite the default `Trigger` provided by the `WindowAssigner`. Note that specifying a triggers does not add an additional trigger condition but replaces the current trigger. + +```scala +// override the default trigger of the WindowAssigner +windowed = windowed + .trigger(myTrigger: Trigger[IN, WINDOW]) +``` + +We may want to specify an optional `Evictor` as follows. + +```scala +// specify an optional evictor +windowed = windowed + .evictor(myEvictor: Evictor[IN, WINDOW]) +``` + +Finally, we apply a `WindowFunction` that returns elements of type `OUT` to obtain a `DataStream[OUT]`. + +```scala +// apply window function to windowed stream +val output: DataStream[OUT] = windowed + .apply(myWinFunc: WindowFunction[IN, OUT, KEY, WINDOW]) +``` + +With Flink's internal windowing mechanics and its exposure through the DataStream API it is possible to implement very custom windowing logic such as session windows or windows that emit early results if the values exceed a certain threshold. + +## Conclusion + +Support for various types of windows over continuous data streams is a must-have for modern stream processors. Apache Flink is a stream processor with a very strong feature set, including a very flexible mechanism to build and evaluate windows over continuous data streams. Flink provides pre-defined window operators for common uses cases as well as a toolbox that allows to define very custom windowing logic. The Flink community will add more pre-defined window operators as we learn the requirements from our users. \ No newline at end of file diff --git a/docs/content.tr/posts/2015-12-11-storm-compatibility.md b/docs/content.tr/posts/2015-12-11-storm-compatibility.md new file mode 100644 index 0000000000..12e8e45e23 --- /dev/null +++ b/docs/content.tr/posts/2015-12-11-storm-compatibility.md @@ -0,0 +1,160 @@ +--- +author: Matthias J. Sax +author-twitter: MatthiasJSax +date: "2015-12-11T10:00:00Z" +excerpt: In this blog post, we describe Flink's compatibility package for Apache + Storm that allows to embed Spouts (sources) and Bolts (operators) in a regular + Flink streaming job. Furthermore, the compatibility package provides a Storm compatible + API in order to execute whole Storm topologies with (almost) no code adaption. +title: 'Storm Compatibility in Apache Flink: How to run existing Storm topologies + on Flink' +aliases: +- /news/2015/12/11/storm-compatibility.html +--- + +[Apache Storm](https://storm.apache.org) was one of the first distributed and scalable stream processing systems available in the open source space offering (near) real-time tuple-by-tuple processing semantics. +Initially released by the developers at Backtype in 2011 under the Eclipse open-source license, it became popular very quickly. +Only shortly afterwards, Twitter acquired Backtype. +Since then, Storm has been growing in popularity, is used in production at many big companies, and is the de-facto industry standard for big data stream processing. +In 2013, Storm entered the Apache incubator program, followed by its graduation to top-level in 2014. + +Apache Flink is a stream processing engine that improves upon older technologies like Storm in several dimensions, +including [strong consistency guarantees]({{< param DocsBaseUrl >}}flink-docs-master/internals/stream_checkpointing.html) ("exactly once"), +a higher level [DataStream API]({{< param DocsBaseUrl >}}flink-docs-master/apis/streaming_guide.html), +support for [event time and a rich windowing system](http://flink.apache.org/news/2015/12/04/Introducing-windows.html), +as well as [superior throughput with competitive low latency](https://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/). + +While Flink offers several technical benefits over Storm, an existing investment on a codebase of applications developed for Storm often makes it difficult to switch engines. +For these reasons, as part of the Flink 0.10 release, Flink ships with a Storm compatibility package that allows users to: + +* Run **unmodified** Storm topologies using Apache Flink benefiting from superior performance. +* **Embed** Storm code (spouts and bolts) as operators inside Flink DataStream programs. + +Only minor code changes are required in order to submit the program to Flink instead of Storm. +This minimizes the work for developers to run existing Storm topologies while leveraging Apache Flink’s fast and robust execution engine. + +We note that the Storm compatibility package is continuously improving and does not cover the full spectrum of Storm’s API. +However, it is powerful enough to cover many use cases. + +## Executing Storm topologies with Flink + +
+ +
+ +The easiest way to use the Storm compatibility package is by executing a whole Storm topology in Flink. +For this, you only need to replace the dependency `storm-core` by `flink-storm` in your Storm project and **change two lines of code** in your original Storm program. + +The following example shows a simple Storm-Word-Count-Program that can be executed in Flink. +First, the program is assembled the Storm way without any code change to Spouts, Bolts, or the topology itself. + +```java +// assemble topology, the Storm way +TopologyBuilder builder = new TopologyBuilder(); +builder.setSpout("source", new StormFileSpout(inputFilePath)); +builder.setBolt("tokenizer", new StormBoltTokenizer()) + .shuffleGrouping("source"); +builder.setBolt("counter", new StormBoltCounter()) + .fieldsGrouping("tokenizer", new Fields("word")); +builder.setBolt("sink", new StormBoltFileSink(outputFilePath)) + .shuffleGrouping("counter"); +``` + +In order to execute the topology, we need to translate it to a `FlinkTopology` and submit it to a local or remote Flink cluster, very similar to submitting the application to a Storm cluster.1 + +```java +// transform Storm topology to Flink program +// replaces: StormTopology topology = builder.createTopology(); +FlinkTopology topology = FlinkTopology.createTopology(builder); + +Config conf = new Config(); +if(runLocal) { + // use FlinkLocalCluster instead of LocalCluster + FlinkLocalCluster cluster = FlinkLocalCluster.getLocalCluster(); + cluster.submitTopology("WordCount", conf, topology); +} else { + // use FlinkSubmitter instead of StormSubmitter + FlinkSubmitter.submitTopology("WordCount", conf, topology); +} +``` + +As a shorter Flink-style alternative that replaces the Storm-style submission code, you can also use context-based job execution: + +```java +// transform Storm topology to Flink program (as above) +FlinkTopology topology = FlinkTopology.createTopology(builder); + +// executes locally by default or remotely if submitted with Flink's command-line client +topology.execute() +``` + +After the code is packaged in a jar file (e.g., `StormWordCount.jar`), it can be easily submitted to Flink via + +``` +bin/flink run StormWordCount.jar +``` + +The used Spouts and Bolts as well as the topology assemble code is not changed at all! +Only the translation and submission step have to be changed to the Storm-API compatible Flink pendants. +This allows for minimal code changes and easy adaption to Flink. + +### Embedding Spouts and Bolts in Flink programs + +It is also possible to use Spouts and Bolts within a regular Flink DataStream program. +The compatibility package provides wrapper classes for Spouts and Bolts which are implemented as a Flink `SourceFunction` and `StreamOperator` respectively. +Those wrappers automatically translate incoming Flink POJO and `TupleXX` records into Storm's `Tuple` type and emitted `Values` back into either POJOs or `TupleXX` types for further processing by Flink operators. +As Storm is type agnostic, it is required to specify the output type of embedded Spouts/Bolts manually to get a fully typed Flink streaming program. + +```java +// use regular Flink streaming environment +StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + +// use Spout as source +DataStream> source = + env.addSource(// Flink provided wrapper including original Spout + new SpoutWrapper(new FileSpout(localFilePath)), + // specify output type manually + TypeExtractor.getForObject(new Tuple1(""))); +// FileSpout cannot be parallelized +DataStream> text = source.setParallelism(1); + +// further processing with Flink +DataStream tokens = text.flatMap(new Tokenizer()).keyBy(0); + +// use Bolt for counting +DataStream counts = + tokens.transform("Counter", + // specify output type manually + TypeExtractor.getForObject(new Tuple2("",0)) + // Flink provided wrapper including original Bolt + new BoltWrapper>(new BoltCounter())); + +// write result to file via Flink sink +counts.writeAsText(outputPath); + +// start Flink job +env.execute("WordCount with Spout source and Bolt counter"); +``` + +Although some boilerplate code is needed (we plan to address this soon!), the actual embedded Spout and Bolt code can be used unmodified. +We also note that the resulting program is fully typed, and type errors will be found by Flink's type extractor even if the original Spouts and Bolts are not. + +## Outlook + +The Storm compatibility package is currently in beta and undergoes continuous development. +We are currently working on providing consistency guarantees for stateful Bolts. +Furthermore, we want to provide a better API integration for embedded Spouts and Bolts by providing a "StormExecutionEnvironment" as a special extension of Flink's `StreamExecutionEnvironment`. +We are also investigating the integration of Storm's higher-level programming API Trident. + +## Summary + +Flink's compatibility package for Storm allows using unmodified Spouts and Bolts within Flink. +This enables you to even embed third-party Spouts and Bolts where the source code is not available. +While you can embed Spouts/Bolts in a Flink program and mix-and-match them with Flink operators, running whole topologies is the easiest way to get started and can be achieved with almost no code changes. + +If you want to try out Flink's Storm compatibility package checkout our [Documentation]({{< param DocsBaseUrl >}}flink-docs-master/apis/streaming/storm_compatibility.html). + +
+ +1. We confess, there are three lines changed compared to a Storm project ---because the example covers local *and* remote execution. + diff --git a/docs/content.tr/posts/2015-12-18-a-year-in-review.md b/docs/content.tr/posts/2015-12-18-a-year-in-review.md new file mode 100644 index 0000000000..99d9b1a7ce --- /dev/null +++ b/docs/content.tr/posts/2015-12-18-a-year-in-review.md @@ -0,0 +1,215 @@ +--- +author: Robert Metzger +author-twitter: rmetzger_ +date: "2015-12-18T10:00:00Z" +excerpt:

With 2015 ending, we thought that this would be good time to reflect on + the amazing work done by the Flink community over this past year, and how much this + community has grown.

+title: 'Flink 2015: A year in review, and a lookout to 2016' +aliases: +- /news/2015/12/18/a-year-in-review.html +--- + +With 2015 ending, we thought that this would be good time to reflect +on the amazing work done by the Flink community over this past year, +and how much this community has grown. + +Overall, we have seen Flink grow in terms of functionality from an +engine to one of the most complete open-source stream processing +frameworks available. The community grew from a relatively small and +geographically focused team, to a truly global, and one of the largest +big data communities in the the Apache Software Foundation. + +We will also look at some interesting stats, including that the +busiest days for Flink are Mondays (who would have thought :-). + +# Community growth + +Let us start with some simple statistics from [Flink's +github repository](https://github.com/apache/flink). During 2015, the +Flink community **doubled** in size, from about 75 contributors to +over 150. Forks of the repository more than **tripled** from 160 in +February 2015 to 544 in December 2015, and the number of stars of the +repository almost tripled from 289 to 813. + +
+ +
+ +Although Flink started out geographically in Berlin, Germany, the +community is by now spread all around the globe, with many +contributors from North America, Europe, and Asia. A simple search at +meetup.com for groups that mention Flink as a focus area reveals [16 +meetups around the globe](http://apache-flink.meetup.com/): + +
+ +
+ +# Flink Forward 2015 + +One of the highlights of the year for Flink was undoubtedly the [Flink +Forward](http://2015.flink-forward.org/) conference, the first conference +on Apache Flink that was held in October in Berlin. More than 250 +participants (roughly half based outside Germany where the conference +was held) attended more than 33 technical talks from organizations +including Google, MongoDB, Bouygues Telecom, NFLabs, Euranova, RedHat, +IBM, Huawei, Intel, Ericsson, Capital One, Zalando, Amadeus, the Otto +Group, and ResearchGate. If you have not yet watched their talks, +check out the [slides](http://2015.flink-forward.org/?post_type=day) and +[videos](https://www.youtube.com/playlist?list=PLDX4T_cnKjD31JeWR1aMOi9LXPRQ6nyHO) +from Flink Forward. + +
+ +
+ +# Media coverage + +And of course, interest in Flink was picked up by the tech +media. During 2015, articles about Flink appeared in +[InfoQ](http://www.infoq.com/Apache-Flink/news/), +[ZDNet](http://www.zdnet.com/article/five-open-source-big-data-projects-to-watch/), +[Datanami](http://www.datanami.com/tag/apache-flink/), +[Infoworld](http://www.infoworld.com/article/2919602/hadoop/flink-hadoops-new-contender-for-mapreduce-spark.html) +(including being one of the [best open source big data tools of +2015](http://www.infoworld.com/article/2982429/open-source-tools/bossie-awards-2015-the-best-open-source-big-data-tools.html)), +the [Gartner +blog](http://blogs.gartner.com/nick-heudecker/apache-flink-offers-a-challenge-to-spark/), +[Dataconomy](http://dataconomy.com/tag/apache-flink/), +[SDTimes](http://sdtimes.com/tag/apache-flink/), the [MapR +blog](https://www.mapr.com/blog/apache-flink-new-way-handle-streaming-data), +[KDnuggets](http://www.kdnuggets.com/2015/08/apache-flink-stream-processing.html), +and +[HadoopSphere](http://www.hadoopsphere.com/2015/02/distributed-data-processing-with-apache.html). + +
+ +
+ +It is interesting to see that Hadoop Summit EMEA 2016 had a whopping +number of 17 (!) talks submitted that are mentioning Flink in their +title and abstract: + +
+ +
+ +# Fun with stats: when do committers commit? + +To get some deeper insight on what is happening in the Flink +community, let us do some analytics on the git log of the project :-) +The easiest thing we can do is count the number of commits at the +repository in 2015. Running + +``` +git log --pretty=oneline --after=1/1/2015 | wc -l +``` + +on the Flink repository yields a total of **2203 commits** in 2015. + +To dig deeper, we will use an open source tool called gitstats that +will give us some interesting statistics on the committer +behavior. You can create these also yourself and see many more by +following four easy steps: + +1. Download gitstats from the [project homepage](http://gitstats.sourceforge.net/).. E.g., on OS X with homebrew, type + +``` +brew install --HEAD homebrew/head-only/gitstats +``` + +2. Clone the Apache Flink git repository: + +``` +git clone git@github.com:apache/flink.git +``` + +3. Generate the statistics + +``` +gitstats flink/ flink-stats/ +``` + +4. View all the statistics as an html page using your favorite browser (e.g., chrome): + +``` +chrome flink-stats/index.html +``` + +First, we can see a steady growth of lines of code in Flink since the +initial Apache incubator project. During 2015, the codebase almost +**doubled** from 500,000 LOC to 900,000 LOC. + +
+ +
+ +It is interesting to see when committers commit. For Flink, Monday +afternoons are by far the most popular times to commit to the +repository: + +
+ +
+ +# Feature timeline + +So, what were the major features added to Flink and the Flink +ecosystem during 2015? Here is a (non-exhaustive) chronological list: + +
+ +
+ +# Roadmap for 2016 + +With 2015 coming to a close, the Flink community has already started +discussing Flink's roadmap for the future. Some highlights +are: + +* **Runtime scaling of streaming jobs:** streaming jobs are running + forever, and need to react to a changing environment. Runtime + scaling means dynamically increasing and decreasing the + parallelism of a job to sustain certain SLAs, or react to changing + input throughput. + +* **SQL queries for static data sets and streams:** building on top of + Flink's Table API, users should be able to write SQL + queries for static data sets, as well as SQL queries on data + streams that continuously produce new results. + +* **Streaming operators backed by managed memory:** currently, + streaming operators like user-defined state and windows are backed + by JVM heap objects. Moving those to Flink managed memory will add + the ability to spill to disk, GC efficiency, as well as better + control over memory utilization. + +* **Library for detecting temporal event patterns:** a common use case + for stream processing is detecting patterns in an event stream + with timestamps. Flink makes this possible with its support for + event time, so many of these operators can be surfaced in the form + of a library. + +* **Support for Apache Mesos, and resource-dynamic YARN support:** + support for both Mesos and YARN, including dynamic allocation and + release of resource for more resource elasticity (for both batch + and stream processing). + +* **Security:** encrypt both the messages exchanged between + TaskManagers and JobManager, as well as the connections for data + exchange between workers. + +* **More streaming connectors, more runtime metrics, and continuous + DataStream API enhancements:** add support for more sources and + sinks (e.g., Amazon Kinesis, Cassandra, Flume, etc), expose more + metrics to the user, and provide continuous improvements to the + DataStream API. + +If you are interested in these features, we highly encourage you to +take a look at the [current +draft](https://docs.google.com/document/d/1ExmtVpeVVT3TIhO1JoBpC5JKXm-778DAD7eqw5GANwE/edit), +and [join the +discussion](https://mail-archives.apache.org/mod_mbox/flink-dev/201512.mbox/browser) +on the Flink mailing lists. + diff --git a/docs/content.tr/posts/2016-02-11-release-0.10.2.md b/docs/content.tr/posts/2016-02-11-release-0.10.2.md new file mode 100644 index 0000000000..ef606bfad8 --- /dev/null +++ b/docs/content.tr/posts/2016-02-11-release-0.10.2.md @@ -0,0 +1,34 @@ +--- +date: "2016-02-11T08:00:00Z" +title: Flink 0.10.2 Released +aliases: +- /news/2016/02/11/release-0.10.2.html +--- + +Today, the Flink community released Flink version **0.10.2**, the second bugfix release of the 0.10 series. + +We **recommend all users updating to this release** by bumping the version of your Flink dependencies to `0.10.2` and updating the binaries on the server. + +## Issues fixed + +* [FLINK-3242](https://issues.apache.org/jira/browse/FLINK-3242): Adjust StateBackendITCase for 0.10 signatures of state backends +* [FLINK-3236](https://issues.apache.org/jira/browse/FLINK-3236): Flink user code classloader as parent classloader from Flink core classes +* [FLINK-2962](https://issues.apache.org/jira/browse/FLINK-2962): Cluster startup script refers to unused variable +* [FLINK-3151](https://issues.apache.org/jira/browse/FLINK-3151): Downgrade to Netty version 4.0.27.Final +* [FLINK-3224](https://issues.apache.org/jira/browse/FLINK-3224): Call setInputType() on output formats that implement InputTypeConfigurable +* [FLINK-3218](https://issues.apache.org/jira/browse/FLINK-3218): Fix overriding of user parameters when merging Hadoop configurations +* [FLINK-3189](https://issues.apache.org/jira/browse/FLINK-3189): Fix argument parsing of CLI client INFO action +* [FLINK-3176](https://issues.apache.org/jira/browse/FLINK-3176): Improve documentation for window apply +* [FLINK-3185](https://issues.apache.org/jira/browse/FLINK-3185): Log error on failure during recovery +* [FLINK-3185](https://issues.apache.org/jira/browse/FLINK-3185): Don't swallow test failure Exception +* [FLINK-3147](https://issues.apache.org/jira/browse/FLINK-3147): Expose HadoopOutputFormatBase fields as protected +* [FLINK-3145](https://issues.apache.org/jira/browse/FLINK-3145): Pin Kryo version of transitive dependencies +* [FLINK-3143](https://issues.apache.org/jira/browse/FLINK-3143): Update Closure Cleaner's ASM references to ASM5 +* [FLINK-3136](https://issues.apache.org/jira/browse/FLINK-3136): Fix shaded imports in ClosureCleaner.scala +* [FLINK-3108](https://issues.apache.org/jira/browse/FLINK-3108): JoinOperator's with() calls the wrong TypeExtractor method +* [FLINK-3125](https://issues.apache.org/jira/browse/FLINK-3125): Web server starts also when JobManager log files cannot be accessed. +* [FLINK-3080](https://issues.apache.org/jira/browse/FLINK-3080): Relax restrictions of DataStream.union() +* [FLINK-3081](https://issues.apache.org/jira/browse/FLINK-3081): Properly stop periodic Kafka committer +* [FLINK-3082](https://issues.apache.org/jira/browse/FLINK-3082): Fixed confusing error about an interface that no longer exists +* [FLINK-3067](https://issues.apache.org/jira/browse/FLINK-3067): Enforce zkclient 0.7 for Kafka +* [FLINK-3020](https://issues.apache.org/jira/browse/FLINK-3020): Set number of task slots to maximum parallelism in local execution diff --git a/docs/content.tr/posts/2016-03-08-release-1.0.0.md b/docs/content.tr/posts/2016-03-08-release-1.0.0.md new file mode 100644 index 0000000000..f598bc9d13 --- /dev/null +++ b/docs/content.tr/posts/2016-03-08-release-1.0.0.md @@ -0,0 +1,124 @@ +--- +date: "2016-03-08T13:00:00Z" +title: Announcing Apache Flink 1.0.0 +aliases: +- /news/2016/03/08/release-1.0.0.html +--- + +The Apache Flink community is pleased to announce the availability of the 1.0.0 release. The community put significant effort into improving and extending Apache Flink since the last release, focusing on improving the experience of writing and executing data stream processing pipelines in production. + +
+ +
+ +Flink version 1.0.0 marks the beginning of the 1.X.X series of releases, which will maintain backwards compatibility with 1.0.0. This means that applications written against stable APIs of Flink 1.0.0 will compile and run with all Flink versions in the 1. series. This is the first time we are formally guaranteeing compatibility in Flink's history, and we therefore see this release as a major milestone of the project, perhaps the most important since graduation as a top-level project. + +Apart from backwards compatibility, Flink 1.0.0 brings a variety of new user-facing features, as well as tons of bug fixes. About 64 contributors provided bug fixes, improvements, and new features such that in total more than 450 JIRA issues could be resolved. + +We encourage everyone to [download the release](http://flink.apache.org/downloads.html) and [check out the documentation]({{< param DocsBaseUrl >}}flink-docs-release-1.0/). Feedback through the Flink [mailing lists](http://flink.apache.org/community.html#mailing-lists) is, as always, very welcome! + +## Interface stability annotations + +Flink 1.0.0 introduces interface stability annotations for API classes and methods. Interfaces defined as `@Public` are guaranteed to remain stable across all releases of the 1.x series. The `@PublicEvolving` annotation marks API features that may be subject to change in future versions. + +Flink's stability annotations will help users to implement applications that compile and execute unchanged against future versions of Flink 1.x. This greatly reduces the complexity for users when upgrading to a newer Flink release. + +## Out-of-core state support + +Flink 1.0.0 adds a new state backend that uses RocksDB to store state (both windows and user-defined key-value state). [RocksDB](http://rocksdb.org/) is an embedded key/value store database, originally developed by Facebook. +When using this backend, active state in streaming programs can grow well beyond memory. The RocksDB files are stored in a distributed file system such as HDFS or S3 for backups. + +## Savepoints and version upgrades + +Savepoints are checkpoints of the state of a running streaming job that can be manually triggered by the user while the job is running. Savepoints solve several production headaches, including code upgrades (both application and framework), cluster maintenance and migration, A/B testing and what-if scenarios, as well as testing and debugging. Read more about savepoints at the [data Artisans blog](http://data-artisans.com/how-apache-flink-enables-new-streaming-applications/). + +## Library for Complex Event Processing (CEP) + +Complex Event Processing has been one of the oldest and more important use cases from stream processing. The new CEP functionality in Flink allows you to use a distributed general-purpose stream processor instead of a specialized CEP system to detect complex patterns in event streams. Get started with [CEP on Flink]({{< param DocsBaseUrl >}}flink-docs-master/apis/streaming/libs/cep.html). + +## Enhanced monitoring interface: job submission, checkpoint statistics and backpressure monitoring + +The web interface now allows users to submit jobs. Previous Flink releases had a separate service for submitting jobs. The new interface is part of the JobManager frontend. It also works on YARN now. + +Backpressure monitoring allows users to trigger a sampling mechanism which analyzes the time operators are waiting for new network buffers. When senders are spending most of their time for new network buffers, they are experiencing backpressure from their downstream operators. Many users requested this feature for understanding bottlenecks in both batch and streaming applications. + +## Improved checkpointing control and monitoring + +The checkpointing has been extended by a more fine-grained control mechanism: In previous versions, new checkpoints were triggered independent of the speed at which old checkpoints completed. This can lead to situations where new checkpoints are piling up, because they are triggered too frequently. + +The checkpoint coordinator now exposes statistics through our REST monitoring API and the web interface. Users can review the checkpoint size and duration on a per-operator basis and see the last completed checkpoints. This is helpful for identifying performance issues, such as processing slowdown by the checkpoints. + +## Improved Kafka connector and support for Kafka 0.9 + +Flink 1.0 supports both Kafka 0.8 and 0.9. With the new release, Flink exposes Kafka metrics for the producers and the 0.9 consumer through Flink’s accumulator system. We also enhanced the existing connector for Kafka 0.8, allowing users to subscribe to multiple topics in one source. + +## Changelog and known issues + +This release resolves more than 450 issues, including bug fixes, improvements, and new features. See the [complete changelog](/blog/release_1.0.0-changelog_known_issues.html#changelog) and [known issues](/blog/release_1.0.0-changelog_known_issues.html#known-issues). + +## List of contributors + +- Abhishek Agarwal +- Ajay Bhat +- Aljoscha Krettek +- Andra Lungu +- Andrea Sella +- Chesnay Schepler +- Chiwan Park +- Daniel Pape +- Fabian Hueske +- Filipe Correia +- Frederick F. Kautz IV +- Gabor Gevay +- Gabor Horvath +- Georgios Andrianakis +- Greg Hogan +- Gyula Fora +- Henry Saputra +- Hilmi Yildirim +- Hubert Czerpak +- Jark Wu +- Johannes +- Jun Aoki +- Jun Aoki +- Kostas Kloudas +- Li Chengxiang +- Lun Gao +- Martin Junghanns +- Martin Liesenberg +- Matthias J. Sax +- Maximilian Michels +- Márton Balassi +- Nick Dimiduk +- Niels Basjes +- Omer Katz +- Paris Carbone +- Patrice Freydiere +- Peter Vandenabeele +- Piotr Godek +- Prez Cannady +- Robert Metzger +- Romeo Kienzler +- Sachin Goel +- Saumitra Shahapure +- Sebastian Klemke +- Stefano Baghino +- Stephan Ewen +- Stephen Samuel +- Subhobrata Dey +- Suneel Marthi +- Ted Yu +- Theodore Vasiloudis +- Till Rohrmann +- Timo Walther +- Trevor Grant +- Ufuk Celebi +- Ulf Karlsson +- Vasia Kalavri +- fversaci +- madhukar +- qingmeng.wyh +- ramkrishna +- rtudoran +- sahitya-pavurala +- zhangminglei diff --git a/docs/content.tr/posts/2016-04-06-cep-monitoring.md b/docs/content.tr/posts/2016-04-06-cep-monitoring.md new file mode 100644 index 0000000000..cc9eb7175a --- /dev/null +++ b/docs/content.tr/posts/2016-04-06-cep-monitoring.md @@ -0,0 +1,218 @@ +--- +author: Till Rohrmann +author-twitter: stsffap +date: "2016-04-06T10:00:00Z" +excerpt: In this blog post, we introduce Flink's new CEP + library that allows you to do pattern matching on event streams. Through the + example of monitoring a data center and generating alerts, we showcase the library's + ease of use and its intuitive Pattern API. +title: Introducing Complex Event Processing (CEP) with Apache Flink +aliases: +- /news/2016/04/06/cep-monitoring.html +--- + +With the ubiquity of sensor networks and smart devices continuously collecting more and more data, we face the challenge to analyze an ever growing stream of data in near real-time. +Being able to react quickly to changing trends or to deliver up to date business intelligence can be a decisive factor for a company’s success or failure. +A key problem in real time processing is the detection of event patterns in data streams. + +Complex event processing (CEP) addresses exactly this problem of matching continuously incoming events against a pattern. +The result of a matching are usually complex events which are derived from the input events. +In contrast to traditional DBMSs where a query is executed on stored data, CEP executes data on a stored query. +All data which is not relevant for the query can be immediately discarded. +The advantages of this approach are obvious, given that CEP queries are applied on a potentially infinite stream of data. +Furthermore, inputs are processed immediately. +Once the system has seen all events for a matching sequence, results are emitted straight away. +This aspect effectively leads to CEP’s real time analytics capability. + +Consequently, CEP’s processing paradigm drew significant interest and found application in a wide variety of use cases. +Most notably, CEP is used nowadays for financial applications such as stock market trend and credit card fraud detection. +Moreover, it is used in RFID-based tracking and monitoring, for example, to detect thefts in a warehouse where items are not properly checked out. +CEP can also be used to detect network intrusion by specifying patterns of suspicious user behaviour. + +Apache Flink with its true streaming nature and its capabilities for low latency as well as high throughput stream processing is a natural fit for CEP workloads. +Consequently, the Flink community has introduced the first version of a new [CEP library]({{< param DocsBaseUrl >}}flink-docs-master/apis/streaming/libs/cep.html) with [Flink 1.0](http://flink.apache.org/news/2016/03/08/release-1.0.0.html). +In the remainder of this blog post, we introduce Flink’s CEP library and we illustrate its ease of use through the example of monitoring a data center. + +## Monitoring and alert generation for data centers + +
+ +
+ +Assume we have a data center with a number of racks. +For each rack the power consumption and the temperature are monitored. +Whenever such a measurement takes place, a new power or temperature event is generated, respectively. +Based on this monitoring event stream, we want to detect racks that are about to overheat, and dynamically adapt their workload and cooling. + +For this scenario we use a two staged approach. +First, we monitor the temperature events. +Whenever we see two consecutive events whose temperature exceeds a threshold value, we generate a temperature warning with the current average temperature. +A temperature warning does not necessarily indicate that a rack is about to overheat. +But whenever we see two consecutive warnings with increasing temperatures, then we want to issue an alert for this rack. +This alert can then lead to countermeasures to cool the rack. + +### Implementation with Apache Flink + +First, we define the messages of the incoming monitoring event stream. +Every monitoring message contains its originating rack ID. +The temperature event additionally contains the current temperature and the power consumption event contains the current voltage. +We model the events as POJOs: + +```java +public abstract class MonitoringEvent { + private int rackID; + ... +} + +public class TemperatureEvent extends MonitoringEvent { + private double temperature; + ... +} + +public class PowerEvent extends MonitoringEvent { + private double voltage; + ... +} +``` + +Now we can ingest the monitoring event stream using one of Flink’s connectors (e.g. Kafka, RabbitMQ, etc.). +This will give us a `DataStream inputEventStream` which we will use as the input for Flink’s CEP operator. +But first, we have to define the event pattern to detect temperature warnings. +The CEP library offers an intuitive [Pattern API]({{< param DocsBaseUrl >}}flink-docs-master/apis/streaming/libs/cep.html#the-pattern-api) to easily define these complex patterns. + +Every pattern consists of a sequence of events which can have optional filter conditions assigned. +A pattern always starts with a first event to which we will assign the name `“First Event”`. + +```java +Pattern.begin("First Event"); +``` + +This pattern will match every monitoring event. +Since we are only interested in `TemperatureEvents` whose temperature is above a threshold value, we have to add an additional subtype constraint and a where clause: + +```java +Pattern.begin("First Event") + .subtype(TemperatureEvent.class) + .where(evt -> evt.getTemperature() >= TEMPERATURE_THRESHOLD); +``` + +As stated before, we want to generate a `TemperatureWarning` if and only if we see two consecutive `TemperatureEvents` for the same rack whose temperatures are too high. +The Pattern API offers the `next` call which allows us to add a new event to our pattern. +This event has to follow directly the first matching event in order for the whole pattern to match. + +```java +Pattern warningPattern = Pattern.begin("First Event") + .subtype(TemperatureEvent.class) + .where(evt -> evt.getTemperature() >= TEMPERATURE_THRESHOLD) + .next("Second Event") + .subtype(TemperatureEvent.class) + .where(evt -> evt.getTemperature() >= TEMPERATURE_THRESHOLD) + .within(Time.seconds(10)); +``` + +The final pattern definition also contains the `within` API call which defines that two consecutive `TemperatureEvents` have to occur within a time interval of 10 seconds for the pattern to match. +Depending on the time characteristic setting, this can either be processing, ingestion or event time. + +Having defined the event pattern, we can now apply it on the `inputEventStream`. + +```java +PatternStream tempPatternStream = CEP.pattern( + inputEventStream.keyBy("rackID"), + warningPattern); +``` + +Since we want to generate our warnings for each rack individually, we `keyBy` the input event stream by the `“rackID”` POJO field. +This enforces that matching events of our pattern will all have the same rack ID. + +The `PatternStream` gives us access to successfully matched event sequences. +They can be accessed using the `select` API call. +The `select` API call takes a `PatternSelectFunction` which is called for every matching event sequence. +The event sequence is provided as a `Map` where each `MonitoringEvent` is identified by its assigned event name. +Our pattern select function generates for each matching pattern a `TemperatureWarning` event. + +```java +public class TemperatureWarning { + private int rackID; + private double averageTemperature; + ... +} + +DataStream warnings = tempPatternStream.select( + (Map pattern) -> { + TemperatureEvent first = (TemperatureEvent) pattern.get("First Event"); + TemperatureEvent second = (TemperatureEvent) pattern.get("Second Event"); + + return new TemperatureWarning( + first.getRackID(), + (first.getTemperature() + second.getTemperature()) / 2); + } +); +``` + +Now we have generated a new complex event stream `DataStream warnings` from the initial monitoring event stream. +This complex event stream can again be used as the input for another round of complex event processing. +We use the `TemperatureWarnings` to generate `TemperatureAlerts` whenever we see two consecutive `TemperatureWarnings` for the same rack with increasing temperatures. +The `TemperatureAlerts` have the following definition: + +```java +public class TemperatureAlert { + private int rackID; + ... +} +``` + +At first, we have to define our alert event pattern: + +```java +Pattern alertPattern = Pattern.begin("First Event") + .next("Second Event") + .within(Time.seconds(20)); +``` + +This definition says that we want to see two `TemperatureWarnings` within 20 seconds. +The first event has the name `“First Event”` and the second consecutive event has the name `“Second Event”`. +The individual events don’t have a where clause assigned, because we need access to both events in order to decide whether the temperature is increasing. +Therefore, we apply the filter condition in the select clause. +But first, we obtain again a `PatternStream`. + +```java +PatternStream alertPatternStream = CEP.pattern( + warnings.keyBy("rackID"), + alertPattern); +``` + +Again, we `keyBy` the warnings input stream by the `"rackID"` so that we generate our alerts for each rack individually. +Next we apply the `flatSelect` method which will give us access to matching event sequences and allows us to output an arbitrary number of complex events. +Thus, we will only generate a `TemperatureAlert` if and only if the temperature is increasing. + +```java +DataStream alerts = alertPatternStream.flatSelect( + (Map pattern, Collector out) -> { + TemperatureWarning first = pattern.get("First Event"); + TemperatureWarning second = pattern.get("Second Event"); + + if (first.getAverageTemperature() < second.getAverageTemperature()) { + out.collect(new TemperatureAlert(first.getRackID())); + } + }); +``` + +The `DataStream alerts` is the data stream of temperature alerts for each rack. +Based on these alerts we can now adapt the workload or cooling for overheating racks. + +The full source code for the presented example as well as an example data source which generates randomly monitoring events can be found in [this repository](https://github.com/tillrohrmann/cep-monitoring). + +## Conclusion + +In this blog post we have seen how easy it is to reason about event streams using Flink’s CEP library. +Using the example of monitoring and alert generation for a data center, we have implemented a short program which notifies us when a rack is about to overheat and potentially to fail. + +In the future, the Flink community will further extend the CEP library’s functionality and expressiveness. +Next on the road map is support for a regular expression-like pattern specification, including Kleene star, lower and upper bounds, and negation. +Furthermore, it is planned to allow the where-clause to access fields of previously matched events. +This feature will allow to prune unpromising event sequences early. + +
+ +*Note:* The example code requires Flink 1.0.1 or higher. + diff --git a/docs/content.tr/posts/2016-04-06-release-1.0.1.md b/docs/content.tr/posts/2016-04-06-release-1.0.1.md new file mode 100644 index 0000000000..5421b89758 --- /dev/null +++ b/docs/content.tr/posts/2016-04-06-release-1.0.1.md @@ -0,0 +1,70 @@ +--- +date: "2016-04-06T08:00:00Z" +title: Flink 1.0.1 Released +aliases: +- /news/2016/04/06/release-1.0.1.html +--- + +Today, the Flink community released Flink version **1.0.1**, the first bugfix release of the 1.0 series. + +We **recommend all users updating to this release** by bumping the version of your Flink dependencies to `1.0.1` and updating the binaries on the server. You can find the binaries on the updated [Downloads page](/downloads.html). + +## Fixed Issues + +

Bug

+
    +
  • [FLINK-3179] - Combiner is not injected if Reduce or GroupReduce input is explicitly partitioned +
  • +
  • [FLINK-3472] - JDBCInputFormat.nextRecord(..) has misleading message on NPE +
  • +
  • [FLINK-3491] - HDFSCopyUtilitiesTest fails on Windows +
  • +
  • [FLINK-3495] - RocksDB Tests can't run on Windows +
  • +
  • [FLINK-3533] - Update the Gelly docs wrt examples and cluster execution +
  • +
  • [FLINK-3563] - .returns() doesn't compile when using .map() with a custom MapFunction +
  • +
  • [FLINK-3566] - Input type validation often fails on custom TypeInfo implementations +
  • +
  • [FLINK-3578] - Scala DataStream API does not support Rich Window Functions +
  • +
  • [FLINK-3595] - Kafka09 consumer thread does not interrupt when stuck in record emission +
  • +
  • [FLINK-3602] - Recursive Types are not supported / crash TypeExtractor +
  • +
  • [FLINK-3621] - Misleading documentation of memory configuration parameters +
  • +
  • [FLINK-3629] - In wikiedits Quick Start example, "The first call, .window()" should be "The first call, .timeWindow()" +
  • +
  • [FLINK-3651] - Fix faulty RollingSink Restore +
  • +
  • [FLINK-3653] - recovery.zookeeper.storageDir is not documented on the configuration page +
  • +
  • [FLINK-3663] - FlinkKafkaConsumerBase.logPartitionInfo is missing a log marker +
  • +
  • [FLINK-3681] - CEP library does not support Java 8 lambdas as select function +
  • +
  • [FLINK-3682] - CEP operator does not set the processing timestamp correctly +
  • +
  • [FLINK-3684] - CEP operator does not forward watermarks properly +
  • +
+ +

Improvement

+
    +
  • [FLINK-3570] - Replace random NIC selection heuristic by InetAddress.getLocalHost +
  • +
  • [FLINK-3575] - Update Working With State Section in Doc +
  • +
  • [FLINK-3591] - Replace Quickstart K-Means Example by Streaming Example +
  • +
+ +

Test

+
    +
  • [FLINK-2444] - Add tests for HadoopInputFormats +
  • +
  • [FLINK-2445] - Add tests for HadoopOutputFormats +
  • +
diff --git a/docs/content.tr/posts/2016-04-14-flink-forward-announce.md b/docs/content.tr/posts/2016-04-14-flink-forward-announce.md new file mode 100644 index 0000000000..a3c9773162 --- /dev/null +++ b/docs/content.tr/posts/2016-04-14-flink-forward-announce.md @@ -0,0 +1,14 @@ +--- +author: Aljoscha Krettek +author-twitter: aljoscha +date: "2016-04-14T10:00:00Z" +title: Flink Forward 2016 Call for Submissions Is Now Open +aliases: +- /news/2016/04/14/flink-forward-announce.html +--- + +We are happy to announce that the call for submissions for Flink Forward 2016 is now open! The conference will take place September 12-14, 2016 in Berlin, Germany, bringing together the open source stream processing community. Most Apache Flink committers will attend the conference, making it the ideal venue to learn more about the project and its roadmap and connect with the community. + +The conference welcomes submissions on everything Flink-related, including experiences with using Flink, products based on Flink, technical talks on extending Flink, as well as connecting Flink with other open source or proprietary software. + +Read more [here](http://flink-forward.org/). \ No newline at end of file diff --git a/docs/content.tr/posts/2016-04-22-release-1.0.2.md b/docs/content.tr/posts/2016-04-22-release-1.0.2.md new file mode 100644 index 0000000000..6268b24704 --- /dev/null +++ b/docs/content.tr/posts/2016-04-22-release-1.0.2.md @@ -0,0 +1,37 @@ +--- +date: "2016-04-22T08:00:00Z" +title: Flink 1.0.2 Released +aliases: +- /news/2016/04/22/release-1.0.2.html +--- + +Today, the Flink community released Flink version **1.0.2**, the second bugfix release of the 1.0 series. + +We **recommend all users updating to this release** by bumping the version of your Flink dependencies to `1.0.2` and updating the binaries on the server. You can find the binaries on the updated [Downloads page](/downloads.html). + +## Fixed Issues + +### Bug + +* [[FLINK-3657](https://issues.apache.org/jira/browse/FLINK-3657)] [dataSet] Change access of DataSetUtils.countElements() to 'public' +* [[FLINK-3762](https://issues.apache.org/jira/browse/FLINK-3762)] [core] Enable Kryo reference tracking +* [[FLINK-3732](https://issues.apache.org/jira/browse/FLINK-3732)] [core] Fix potential null deference in ExecutionConfig#equals() +* [[FLINK-3760](https://issues.apache.org/jira/browse/FLINK-3760)] Fix StateDescriptor.readObject +* [[FLINK-3730](https://issues.apache.org/jira/browse/FLINK-3730)] Fix RocksDB Local Directory Initialization +* [[FLINK-3712](https://issues.apache.org/jira/browse/FLINK-3712)] Make all dynamic properties available to the CLI frontend +* [[FLINK-3688](https://issues.apache.org/jira/browse/FLINK-3688)] WindowOperator.trigger() does not emit Watermark anymore +* [[FLINK-3697](https://issues.apache.org/jira/browse/FLINK-3697)] Properly access type information for nested POJO key selection + +### Improvement + +- [[FLINK-3654](https://issues.apache.org/jira/browse/FLINK-3654)] Disable Write-Ahead-Log in RocksDB State + +### Docs +- [[FLINK-2544](https://issues.apache.org/jira/browse/FLINK-2544)] [docs] Add Java 8 version for building PowerMock tests to docs +- [[FLINK-3469](https://issues.apache.org/jira/browse/FLINK-3469)] [docs] Improve documentation for grouping keys +- [[FLINK-3634](https://issues.apache.org/jira/browse/FLINK-3634)] [docs] Fix documentation for DataSetUtils.zipWithUniqueId() +- [[FLINK-3711](https://issues.apache.org/jira/browse/FLINK-3711)][docs] Documentation of Scala fold()() uses correct syntax + +### Tests + +- [[FLINK-3716](https://issues.apache.org/jira/browse/FLINK-3716)] [kafka consumer] Decreasing socket timeout so testFailOnNoBroker() will pass before JUnit timeout diff --git a/docs/content.tr/posts/2016-05-11-release-1.0.3.md b/docs/content.tr/posts/2016-05-11-release-1.0.3.md new file mode 100644 index 0000000000..dc04a6edad --- /dev/null +++ b/docs/content.tr/posts/2016-05-11-release-1.0.3.md @@ -0,0 +1,35 @@ +--- +date: "2016-05-11T08:00:00Z" +title: Flink 1.0.3 Released +aliases: +- /news/2016/05/11/release-1.0.3.html +--- + +Today, the Flink community released Flink version **1.0.3**, the third bugfix release of the 1.0 series. + +We **recommend all users updating to this release** by bumping the version of your Flink dependencies to `1.0.3` and updating the binaries on the server. You can find the binaries on the updated [Downloads page](/downloads.html). + +## Fixed Issues + +### Bug + +* [[FLINK-3790](https://issues.apache.org/jira/browse/FLINK-3790)] [streaming] Use proper hadoop config in rolling sink +* [[FLINK-3840](https://issues.apache.org/jira/browse/FLINK-3840)] Remove Testing Files in RocksDB Backend +* [[FLINK-3835](https://issues.apache.org/jira/browse/FLINK-3835)] [optimizer] Add input id to JSON plan to resolve ambiguous input names +* [hotfix] OptionSerializer.duplicate to respect stateful element serializer +* [[FLINK-3803](https://issues.apache.org/jira/browse/FLINK-3803)] [runtime] Pass CheckpointStatsTracker to ExecutionGraph +* [hotfix] [cep] Make cep window border treatment consistent + +### Improvement + +* [[FLINK-3678](https://issues.apache.org/jira/browse/FLINK-3678)] [dist, docs] Make Flink logs directory configurable + +### Docs + +* [docs] Add note about S3AFileSystem 'buffer.dir' property +* [docs] Update AWS S3 docs + +### Tests + +* [[FLINK-3860](https://issues.apache.org/jira/browse/FLINK-3860)] [connector-wikiedits] Add retry loop to WikipediaEditsSourceTest +* [streaming-contrib] Fix port clash in DbStateBackend tests diff --git a/docs/content.tr/posts/2016-05-24-stream-sql.md b/docs/content.tr/posts/2016-05-24-stream-sql.md new file mode 100644 index 0000000000..3e5839f57f --- /dev/null +++ b/docs/content.tr/posts/2016-05-24-stream-sql.md @@ -0,0 +1,141 @@ +--- +author: Fabian Hueske +author-twitter: fhueske +date: "2016-05-24T10:00:00Z" +excerpt: |- +

About six months ago, the Apache Flink community started an effort to add a SQL interface for stream data analysis. SQL is the standard language to access and process data. Everybody who occasionally analyzes data is familiar with SQL. Consequently, a SQL interface for stream data processing will make this technology accessible to a much wider audience. Moreover, SQL support for streaming data will also enable new use cases such as interactive and ad-hoc stream analysis and significantly simplify many applications including stream ingestion and simple transformations.

+

In this blog post, we report on the current status, architectural design, and future plans of the Apache Flink community to implement support for SQL as a language for analyzing data streams.

+title: Stream Processing for Everyone with SQL and Apache Flink +aliases: +- /news/2016/05/24/stream-sql.html +--- + +The capabilities of open source systems for distributed stream processing have evolved significantly over the last years. Initially, the first systems in the field (notably [Apache Storm](https://storm.apache.org)) provided low latency processing, but were limited to at-least-once guarantees, processing-time semantics, and rather low-level APIs. Since then, several new systems emerged and pushed the state of the art of open source stream processing in several dimensions. Today, users of Apache Flink or [Apache Beam](https://beam.incubator.apache.org) can use fluent Scala and Java APIs to implement stream processing jobs that operate in event-time with exactly-once semantics at high throughput and low latency. + +In the meantime, stream processing has taken off in the industry. We are witnessing a rapidly growing interest in stream processing which is reflected by prevalent deployments of streaming processing infrastructure such as [Apache Kafka](https://kafka.apache.org) and Apache Flink. The increasing number of available data streams results in a demand for people that can analyze streaming data and turn it into real-time insights. However, stream data analysis requires a special skill set including knowledge of streaming concepts such as the characteristics of unbounded streams, windows, time, and state as well as the skills to implement stream analysis jobs usually against Java or Scala APIs. People with this skill set are rare and hard to find. + +About six months ago, the Apache Flink community started an effort to add a SQL interface for stream data analysis. SQL is *the* standard language to access and process data. Everybody who occasionally analyzes data is familiar with SQL. Consequently, a SQL interface for stream data processing will make this technology accessible to a much wider audience. Moreover, SQL support for streaming data will also enable new use cases such as interactive and ad-hoc stream analysis and significantly simplify many applications including stream ingestion and simple transformations. In this blog post, we report on the current status, architectural design, and future plans of the Apache Flink community to implement support for SQL as a language for analyzing data streams. + +## Where did we come from? + +With the [0.9.0-milestone1](http://flink.apache.org/news/2015/04/13/release-0.9.0-milestone1.html) release, Apache Flink added an API to process relational data with SQL-like expressions called the Table API. The central concept of this API is a Table, a structured data set or stream on which relational operations can be applied. The Table API is tightly integrated with the DataSet and DataStream API. A Table can be easily created from a DataSet or DataStream and can also be converted back into a DataSet or DataStream as the following example shows + +```scala +val execEnv = ExecutionEnvironment.getExecutionEnvironment +val tableEnv = TableEnvironment.getTableEnvironment(execEnv) + +// obtain a DataSet from somewhere +val tempData: DataSet[(String, Long, Double)] = + +// convert the DataSet to a Table +val tempTable: Table = tempData.toTable(tableEnv, 'location, 'time, 'tempF) +// compute your result +val avgTempCTable: Table = tempTable + .where('location.like("room%")) + .select( + ('time / (3600 * 24)) as 'day, + 'Location as 'room, + (('tempF - 32) * 0.556) as 'tempC + ) + .groupBy('day, 'room) + .select('day, 'room, 'tempC.avg as 'avgTempC) +// convert result Table back into a DataSet and print it +avgTempCTable.toDataSet[Row].print() +``` + +Although the example shows Scala code, there is also an equivalent Java version of the Table API. The following picture depicts the original architecture of the Table API. + +
+ +
+ +A Table is created from a DataSet or DataStream and transformed into a new Table by applying relational transformations such as `filter`, `join`, or `select` on them. Internally, a logical table operator tree is constructed from the applied Table transformations. When a Table is translated back into a DataSet or DataStream, the respective translator translates the logical operator tree into DataSet or DataStream operators. Expressions like `'location.like("room%")` are compiled into Flink functions via code generation. + +However, the original Table API had a few limitations. First of all, it could not stand alone. Table API queries had to be always embedded into a DataSet or DataStream program. Queries against batch Tables did not support outer joins, sorting, and many scalar functions which are commonly used in SQL queries. Queries against streaming tables only supported filters, union, and projections and no aggregations or joins. Also, the translation process did not leverage query optimization techniques except for the physical optimization that is applied to all DataSet programs. + +## Table API joining forces with SQL + +The discussion about adding support for SQL came up a few times in the Flink community. With Flink 0.9 and the availability of the Table API, code generation for relational expressions, and runtime operators, the foundation for such an extension seemed to be there and SQL support the next logical step. On the other hand, the community was also well aware of the multitude of dedicated "SQL-on-Hadoop" solutions in the open source landscape ([Apache Hive](https://hive.apache.org), [Apache Drill](https://drill.apache.org), [Apache Impala](http://impala.io), [Apache Tajo](https://tajo.apache.org), just to name a few). Given these alternatives, we figured that time would be better spent improving Flink in other ways than implementing yet another SQL-on-Hadoop solution. + +However, with the growing popularity of stream processing and the increasing adoption of Flink in this area, the Flink community saw the need for a simpler API to enable more users to analyze streaming data. About half a year ago, we decided to take the Table API to the next level, extend the stream processing capabilities of the Table API, and add support for SQL on streaming data. What we came up with was a revised architecture for a Table API that supports SQL (and Table API) queries on streaming and static data sources. We did not want to reinvent the wheel and decided to build the new Table API on top of [Apache Calcite](https://calcite.apache.org), a popular SQL parser and optimizer framework. Apache Calcite is used by many projects including Apache Hive, Apache Drill, Cascading, and many [more](https://calcite.apache.org/docs/powered_by.html). Moreover, the Calcite community put [SQL on streams](https://calcite.apache.org/docs/stream.html) on their roadmap which makes it a perfect fit for Flink's SQL interface. + +Calcite is central in the new design as the following architecture sketch shows: + +
+ +
+ +The new architecture features two integrated APIs to specify relational queries, the Table API and SQL. Queries of both APIs are validated against a catalog of registered tables and converted into Calcite's representation for logical plans. In this representation, stream and batch queries look exactly the same. Next, Calcite's cost-based optimizer applies transformation rules and optimizes the logical plans. Depending on the nature of the sources (streaming or static) we use different rule sets. Finally, the optimized plan is translated into a regular Flink DataStream or DataSet program. This step involves again code generation to compile relational expressions into Flink functions. + +The new architecture of the Table API maintains the basic principles of the original Table API and improves it. It keeps a uniform interface for relational queries on streaming and static data. In addition, we take advantage of Calcite's query optimization framework and SQL parser. The design builds upon Flink's established APIs, i.e., the DataStream API that offers low-latency, high-throughput stream processing with exactly-once semantics and consistent results due to event-time processing, and the DataSet API with robust and efficient in-memory operators and pipelined data exchange. Any improvements to Flink's core APIs and engine will automatically improve the execution of Table API and SQL queries. + +With this effort, we are adding SQL support for both streaming and static data to Flink. However, we do not want to see this as a competing solution to dedicated, high-performance SQL-on-Hadoop solutions, such as Impala, Drill, and Hive. Instead, we see the sweet spot of Flink's SQL integration primarily in providing access to streaming analytics to a wider audience. In addition, it will facilitate integrated applications that use Flink's API's as well as SQL while being executed on a single runtime engine. + +## How will Flink's SQL on streams look like? + +So far we discussed the motivation for and architecture of Flink's stream SQL interface, but how will it actually look like? The new SQL interface is integrated into the Table API. DataStreams, DataSets, and external data sources can be registered as tables at the `TableEnvironment` in order to make them queryable with SQL. The `TableEnvironment.sql()` method states a SQL query and returns its result as a Table. The following example shows a complete program that reads a streaming table from a JSON encoded Kafka topic, processes it with a SQL query and writes the resulting stream into another Kafka topic. Please note that the KafkaJsonSource and KafkaJsonSink are under development and not available yet. In the future, TableSources and TableSinks can be persisted to and loaded from files to ease reuse of source and sink definitions and to reduce boilerplate code. + +```scala +// get environments +val execEnv = StreamExecutionEnvironment.getExecutionEnvironment +val tableEnv = TableEnvironment.getTableEnvironment(execEnv) + +// configure Kafka connection +val kafkaProps = ... +// define a JSON encoded Kafka topic as external table +val sensorSource = new KafkaJsonSource[(String, Long, Double)]( + "sensorTopic", + kafkaProps, + ("location", "time", "tempF")) + +// register external table +tableEnv.registerTableSource("sensorData", sensorSource) + +// define query in external table +val roomSensors: Table = tableEnv.sql( + "SELECT STREAM time, location AS room, (tempF - 32) * 0.556 AS tempC " + + "FROM sensorData " + + "WHERE location LIKE 'room%'" + ) + +// define a JSON encoded Kafka topic as external sink +val roomSensorSink = new KafkaJsonSink(...) + +// define sink for room sensor data and execute query +roomSensors.toSink(roomSensorSink) +execEnv.execute() +``` + +You might have noticed that this example left out the most interesting aspects of stream data processing: window aggregates and joins. How will these operations be expressed in SQL? Well, that is a very good question. The Apache Calcite community put out an excellent proposal that discusses the syntax and semantics of [SQL on streams](https://calcite.apache.org/docs/stream.html). It describes Calcite’s stream SQL as *"an extension to standard SQL, not another ‘SQL-like’ language"*. This has several benefits. First, people who are familiar with standard SQL will be able to analyze data streams without learning a new syntax. Queries on static tables and streams are (almost) identical and can be easily ported. Moreover it is possible to specify queries that reference static and streaming tables at the same time which goes well together with Flink’s vision to handle batch processing as a special case of stream processing, i.e., as processing finite streams. Finally, using standard SQL for stream data analysis means following a well established standard that is supported by many tools. + +Although we haven’t completely fleshed out the details of how windows will be defined in Flink’s SQL syntax and Table API, the following examples show how a tumbling window query could look like in SQL and the Table API. + +### SQL (following the syntax proposal of Calcite’s streaming SQL document) + +```sql +SELECT STREAM + TUMBLE_END(time, INTERVAL '1' DAY) AS day, + location AS room, + AVG((tempF - 32) * 0.556) AS avgTempC +FROM sensorData +WHERE location LIKE 'room%' +GROUP BY TUMBLE(time, INTERVAL '1' DAY), location +``` + +### Table API + +```scala +val avgRoomTemp: Table = tableEnv.ingest("sensorData") + .where('location.like("room%")) + .partitionBy('location) + .window(Tumbling every Days(1) on 'time as 'w) + .select('w.end, 'location, , (('tempF - 32) * 0.556).avg as 'avgTempCs) +``` + +## What's up next? + +The Flink community is actively working on SQL support for the next minor version Flink 1.1.0. In the first version, SQL (and Table API) queries on streams will be limited to selection, filter, and union operators. Compared to Flink 1.0.0, the revised Table API will support many more scalar functions and be able to read tables from external sources and write them back to external sinks. A lot of work went into reworking the architecture of the Table API and integrating Apache Calcite. + +In Flink 1.2.0, the feature set of SQL on streams will be significantly extended. Among other things, we plan to support different types of window aggregates and maybe also streaming joins. For this effort, we want to closely collaborate with the Apache Calcite community and help extending Calcite's support for relational operations on streaming data when necessary. + +If this post made you curious and you want to try out Flink’s SQL interface and the new Table API, we encourage you to do so! Simply clone the SNAPSHOT [master branch](https://github.com/apache/flink/tree/master) and check out the [Table API documentation for the SNAPSHOT version]({{< param DocsBaseUrl >}}flink-docs-master/apis/table.html). Please note that the branch is under heavy development, and hence some code examples in this blog post might not work. We are looking forward to your feedback and welcome contributions. \ No newline at end of file diff --git a/docs/content.tr/posts/2016-08-04-release-1.1.0.md b/docs/content.tr/posts/2016-08-04-release-1.1.0.md new file mode 100644 index 0000000000..eb85d73414 --- /dev/null +++ b/docs/content.tr/posts/2016-08-04-release-1.1.0.md @@ -0,0 +1,235 @@ +--- +date: "2016-08-04T13:00:00Z" +title: Announcing Apache Flink 1.1.0 +aliases: +- /news/2016/08/08/release-1.1.0.html +--- + +
Important: The Maven artifacts published with version 1.1.0 on Maven central have a Hadoop dependency issue. It is highly recommended to use 1.1.1 or 1.1.1-hadoop1 as the Flink version.
+ +The Apache Flink community is pleased to announce the availability of Flink 1.1.0. + +This release is the first major release in the 1.X.X series of releases, which maintains API compatibility with 1.0.0. This means that your applications written against stable APIs of Flink 1.0.0 will compile and run with Flink 1.1.0. 95 contributors provided bug fixes, improvements, and new features such that in total more than 450 JIRA issues could be resolved. See the [complete changelog](/blog/release_1.1.0-changelog.html) for more details. + +**We encourage everyone to [download the release](http://flink.apache.org/downloads.html) and [check out the documentation]({{< param DocsBaseUrl >}}flink-docs-release-1.1/). Feedback through the Flink [mailing lists](http://flink.apache.org/community.html#mailing-lists) is, as always, very welcome!** + +Some highlights of the release are listed in the following sections. + +## Connectors + +The [streaming connectors]({{< param DocsBaseUrl >}}flink-docs-release-1.1/apis/streaming/connectors/index.html) are a major part of Flink's DataStream API. This release adds support for new external systems and further improves on the available connectors. + +### Continuous File System Sources + +A frequently requested feature for Flink 1.0 was to be able to monitor directories and process files continuously. Flink 1.1 now adds support for this via `FileProcessingMode`s: + +```java +DataStream stream = env.readFile( + textInputFormat, + "hdfs:///file-path", + FileProcessingMode.PROCESS_CONTINUOUSLY, + 5000, // monitoring interval (millis) + FilePathFilter.createDefaultFilter()); // file path filter +``` + +This will monitor `hdfs:///file-path` every `5000` milliseconds. Check out the [DataSource documentation for more details]({{< param DocsBaseUrl >}}flink-docs-release-1.1/apis/streaming/index.html#data-sources). + +### Kinesis Source and Sink + +Flink 1.1 adds a Kinesis connector for both consuming (`FlinkKinesisConsumer`) from and producing (`FlinkKinesisProduer`) to [Amazon Kinesis Streams](https://aws.amazon.com/kinesis/), which is a managed service purpose-built to make it easy to work with streaming data on AWS. + +```java +DataStream kinesis = env.addSource( + new FlinkKinesisConsumer<>("stream-name", schema, config)); +``` + +Check out the [Kinesis connector documentation for more details]({{< param DocsBaseUrl >}}flink-docs-release-1.1/apis/streaming/connectors/kinesis.html). + +### Cassandra Sink + +The [Apache Cassandra](http://wiki.apache.org/cassandra/GettingStarted) sink allows you to write from Flink to Cassandra. Flink can provide exactly-once guarantees if the query is idempotent, meaning it can be applied multiple times without changing the result. + +```java +CassandraSink.addSink(input) +``` + +Check out the [Cassandra Sink documentation for more details]({{< param DocsBaseUrl >}}flink-docs-release-1.1/apis/streaming/connectors/cassandra.html). + +## Table API and SQL + + The Table API is a SQL-like expression language for relational stream and batch processing that can be easily embedded in Flink’s DataSet and DataStream APIs (for both Java and Scala). + +```java +Table custT = tableEnv + .toTable(custDs, "name, zipcode") + .where("zipcode = '12345'") + .select("name") +``` + +An initial version of this API was already available in Flink 1.0. For Flink 1.1, the community put a lot of work into reworking the architecture of the Table API and integrating it with [Apache Calcite](https://calcite.apache.org). + +In this first version, SQL (and Table API) queries on streams are limited to selection, filter, and union operators. Compared to Flink 1.0, the revised Table API supports many more scalar functions and is able to read tables from external sources and write them back to external sinks. + +```java +Table result = tableEnv.sql( + "SELECT STREAM product, amount FROM Orders WHERE product LIKE '%Rubber%'"); +``` +A more detailed introduction can be found in the [Flink blog](http://flink.apache.org/news/2016/05/24/stream-sql.html) and the [Table API documentation]({{< param DocsBaseUrl >}}flink-docs-release-1.1/apis/table.html). + +## DataStream API + +The DataStream API now exposes **session windows** and **allowed lateness** as first-class citizens. + +### Session Windows + + Session windows are ideal for cases where the window boundaries need to adjust to the incoming data. This enables you to have windows that start at individual points in time for each key and that end once there has been a *certain period of inactivity*. The configuration parameter is the session gap that specifies how long to wait for new data before considering a session as closed. + +
+ +
+ +```java +input.keyBy() + .window(EventTimeSessionWindows.withGap(Time.minutes(10))) + .(); +``` + +### Support for Late Elements + +You can now specify how a windowed transformation should deal with late elements and how much lateness is allowed. The parameter for this is called *allowed lateness*. This specifies by how much time elements can be late. + +```java +input.keyBy().window() + .allowedLateness(