0%

随着市场的逐步成熟,要想保持企业的长期竞争力,运营和产品改进工作需要越来越精细化。

比如,在游戏行业,玩家留存率是一个关键指标,为提升留存率,需要精细化的分析玩家是哪一步流失的,根据游戏进程推进过程,按照先后顺序设置关键节点,分析各个节点流失情况数据,可以形成一个玩家流失漏斗。有了玩家流失漏斗,我们可以选择流失率高的环节进行进一步精细化分析,找到流失原因,比如机器适配问题,引导缺乏吸引力问题,数值设计问题等,根据这些原因就可以针对性的在产品和运营侧做改进了。

阅读全文 »

在一个大型企业中做数据工作,难免要跟各种不同种类的数据库打交道。Oracle,凭借其优异的性能,曾经是很多大型企业标配商业数据库,自然也是我们要重点应对的一种数据库。

Oracle的数据导入导出是一项基本的技能,但是对于懂数据库却不熟悉Oracle的同学可能会有一定的障碍。正好在最近的一个项目中碰到了这样一个任务,于是研究了一下Oracle的数据导入导出,在这里跟大家分享一下。

阅读全文 »

Easy SQL

SQL as the main ETL language

Speaking of data development, we have seen various programming languages being used.

Some team will choose python for it’s simplicity and for the great pandas library. Other team will choose Scala if they are using Spark. Others may try Spark DataFrame API etc.

阅读全文 »

Elegant. Image from https://www.yezibizhi.com/Img-4/100422/111045.shtml

In the previous post, we talked about a new ETL language – Easy SQL. You may be very curious about how to write ETL in Easy SQL. Let’s take a peek at it today.

阅读全文 »

Easy SQL language features mind Mapping

Previous posts about Easy SQL

People like to use Scala because Scala provides powerful type inference and embraces various programming paradigms. People like to use Python because it’s clean, out-of-the-box, delicate and expressive. People like to use rust because rust provides modern language features and zero-cost abstract.

阅读全文 »

Easy SQL language features mind Mapping

Previous posts about Easy SQL

People like to use Scala because Scala provides powerful type inference and embraces various programming paradigms. People like to use Python because it’s clean, out-of-the-box, delicate and expressive. People like to use rust because rust provides modern language features and zero-cost abstract.

阅读全文 »

Efficiency. Image from https://unsplash.com/photos/gZB-i-dA6ns

Previous posts about Easy SQL

It’s always been a pain point to do ETL testing. But it more and more becomes a must after data being so widely used these days.

An ETL with more than 100 lines of code is common. The filter conditions, data transformation rules, join conditions and other logic there could be very complicated.

阅读全文 »

随着数据在越来越多的企业中被应用,数据技术的发展可谓突飞猛进。不仅基于Hadoop的大数据生态在持续完善,我们也能看到很多新兴的分布式技术如潮水般涌现。以下是来自中国信通院《大数据白皮书(2020年)》整理的大数据技术体系图谱:

Data Tech Stack

阅读全文 »

前文讨论了敏捷数据工程实践的相关概念。有哪些具体的敏捷数据工程实践呢?本文将分享“代码化一切”的实践。

Everything as code

代码化XX

在应用软件开发中,“代码化一切”被讨论得很多。常见的代码化XX有:

阅读全文 »

前文讨论了敏捷数据工程实践的相关概念。有哪些具体的敏捷数据工程实践呢?本文将分享“基于代码的复用”实践。

应用软件开发中的代码复用

在应用软件开发中,代码复用是一件显而易见的、开发人员几乎每天都在做的事情。良好的代码复用可以有效降低代码重复率,提高效率,并减少潜在的BUG。

阅读全文 »