tensorflow

Posted on 2024-06-04 Edited on 2025-05-22

基本元素

tf.Variable(……)

import tensorflow as tf
A = tf.Variable(3, name="number")
B = tf.Variable([1,3], name="vector")
C = tf.Variable([[0,1],[2,3]], name="matrix")
D = tf.Variable(tf.zeros([100]), name="zero")
E = tf.Variable(tf.random_normal([2,3], mean=1, stddev=2, dtype=tf.float32))

我们可以把函数variable()理解为构造函数，构造函数的使用需要初始值，而这个初始值是一个任何形状、类型的Tensor。
变量有两个重要的步骤，先后为：

创建
初始化

变量在使用前一定要进行初始化，且变量的初始化必须在模型的其它操作运行之前完成，通常，变量的初始化有三种方式：

1.初始化全部变量
init = tf.global_variables_initializer()
global_variables_initializer()方法是不管全局有多少个变量，全部进行初始化，是最简单也是最常用的一种方式；
2.初始化变量的子集
init_subset=tf.variables_initializer([b,c], name="init_subset")
variables_initializer()是初始化变量的子集，相比于全部初始化化的方式更加节约内存
3.初始化单个变量

recommendation_system

Posted on 2024-06-04 Edited on 2025-05-22

使用Embedding策略在各种任务场景中提取用户信息的paper
《Modeling User Activities on the Web using Paragraph Vector》
:用户在浏览网页的时候会留下来一系列的行为，比方说网页浏览，搜索问题，点击广告等，设用户$i$留下的 $T$个行为表示为$(a{i,1},a{i,2}….,a_{i,Ti})$
)。我们希望根据所有的用户行为数据，生成表征每一个用户的向量，使具有相同行为序列的用户被映射到相似的向量空间之中。我们希望根据所有的用户行为数据，生成表征每一个用户的向量，使具有相同行为序列的用户被映射到相似的向量空间之中。
该论文借鉴了skip-gram 的思想

Deep & Cross Network for Ad Click Predictions

https://blog.csdn.net/Dby_freedom/article/details/86502623
https://blog.csdn.net/qq_40778406/article/details/105009989
https://www.cnblogs.com/LuckPsyduck/p/11995230.html
https://zhuanlan.zhihu.com/p/96010464

hive

Posted on 2024-06-04 Edited on 2025-05-22

查看表的信息

可以看到表的 location, tableType等
根据Table Type值可以知道表是内部表还是外部表

1 2	describe extended tablename desc formatted tablename;

查看表分区

1	>>show partitions ${table_name}

data_process

Posted on 2024-06-04 Edited on 2025-05-22

Byte, KB, MB的笔记

http://myrepono.com/faq/4

A byte is a sequence of 8 bits (enough to represent one alphanumeric character) processed as a single unit of information. A single letter or character would use one byte of memory (8 bits), two characters would use two bytes (16 bits).

1024 bytes = 1 KB
1024 KB = 1 MB
1024 MB = 1 GB
1024 GB = 1 TB
1024 TB = 1 PB
KB = KilobyteM
B = MegabyteG
B = GigabyteT
B = TerabyteP
B = Petabyte

dataframe 估算内存


def memory_usage(df):
    types = df.dtypes
    s = df.memory_usage(deep=True)
    s = s/1024**2
    total_mem = s.sum()
    for column in df.columns:
        if s[column] < 0.01:
            print("{} = {} KB,  {}".format(column, s[column] * 1024, types[column]))
        else:
            print("{} = {:1.2f} MB,  {}".format(column, s[column], types[column]))
    print("totoal memory: {:.2f} MB".format(total_mem))

spark

Posted on 2024-06-04 Edited on 2025-05-22

pyspark API

https://spark.apache.org/docs/2.2.1/api/python/search.html?q=dataframe.write

简明教程

https://sparkbyexamples.com/pyspark/pyspark-orderby-and-sort-explained/
http://www.learnbymarketing.com/1100/pyspark-joins-by-example/

conda

Posted on 2024-02-29 Edited on 2025-05-22

conda 是 Anaconda 的工具箱，它是 pip 和 vitualenv 的组合，也就是说他可以像pip来管理包，也可以像vitualenv来切换环境

installation

https://www.digitalocean.com/community/tutorials/how-to-install-anaconda-on-ubuntu-18-04-quickstart

在conda中使用pip

注意：当在conda的虚拟环境中使用Pip安装包时，需要使用pip -V查看Pip所使用的路径，如果conda没有安装Pip，会使用系统默认的pip命令，这种结果使得pip安装的包被安装到了系统库位置，从而在当前虚拟环境中的python下无法使用或找不到。
一般conda的虚拟环境中自带pip，如果你是用pip3安装，可能使用的是系统的pip，这一点要注意

(face_detector) ➜  ~ pip -V
pip 9.0.1 from /opt/anaconda3/envs/face_detector/lib/python3.5/site-packages (python 3.5)
(face_detector) ➜  ~ pip3 -V
pip 20.0.2 from /usr/local/lib/python3.7/site-packages/pip (python 3.7)

的开发环境会被默认安装在你conda目录下的envs文件目录下。可以指定一个其他的路径；去通过 conda create -h了解更
果我们没有指定安装python的版本，conda会安装我们最初安装conda时所装的那个版本的python。

# 列举当前所有环境
conda info
conda env list
# 创建环境
conda create --name new_env_name python=2.7.9
conda create -n new_env_name python=2.7.9
# 克隆环境 (例如当前环境是base, 需要克隆一个copy_base，地址在~/path)
conda create -n copy_base --clone ~/path
# 激活环境
source activate snowflakes  #linux
activate new_env_name #windows
# 释放环境
source deactivate #linux
deactivate #windows
# 移除环境
conda remove --name new_env_name --all
conda remove -n new_env_name --all
# 保存环境\分享环境
conda env export > environment.yml
# 恢复环境
conda env create -f environment.yml
# 查看当前环境所有package
conda list
# 为指定环境安装某个包
conda install -n env_name package_name
# 查找包有哪些版本
conda search tensorflow-gpu 
# 将conda放入PATH
eval "$(/home/yuanwenwu/anaconda3/bin/conda shell.YOUR_SHELL_NAME hook)"

在python2.7环境中启动notebook 使kernel变为python3、python2共存

1 2	# 进入python2虚拟环境，执行下面语句，然后启动jupyter notebook即可 python -m ipykernel install --user

问题1：python能找到的包，jupyter notebook找不到

是因为python执行路径不一致。
定位这个问题可以通过sys包

1 2	import sys print(sys.executable)

往往这个问题是只安装了python，但要使用ipython或jupyter notebook，由于虚拟环境没有，会去主系统找可用版本呢，从而导致启动路径不一致，解决方式是在虚拟环境conda install需要的包（ ipython or jupyter notebook )
参考：https://blog.csdn.net/sunxinyu/article/details/78801534

conda 更换源

conda 安装指定version包

1 2	conda search tensorflow conda install tensorflow-gpu==2.0.0

matplotlib

Posted on 2024-02-29 Edited on 2025-05-22

matplotlib

pyplot

pyplot负责绘制图像，修饰图像figure。此处应强调的是，其保持matlab的风格，总是跟踪当前figure,绘制函数直接指向当前axes.
figure()函数负责创建一个图像，默认不用调用此函数，并且一个subplot(111)也会默认被创建如果不手动指定axes的话。figure(i)创建标号为i的figure

linux

Posted on 2024-02-29 Edited on 2025-05-22

数组

# 创建数组(注意不需要逗号，以空格分隔)
array=(1 2 3 4)

#获取所有元素
echo ${array[@]}

#获取第一个元素
echo ${array[0]}

#获取数组元素个数
echo ${#array[@]}

#如果某个元素是字符串，还可以通过指定下标的方式获得该元素的长度，如下所示：
echo ${#array[2]}

#因为字符串获取长度如下
str="hello world"
echo ${#str}

linear-algebra

Posted on 2024-02-29 Edited on 2025-05-22 In math

在学习SVM时，遇到了dot product的问题，一时忘了在algebra下定义的向量内积和在geometry下定义的向量内积为何相等，查找了一下资料，发现很有趣，故记录如下。
This operation can be defined either algebraically or geometrically.

Algebraically, it is the sum of the products of the corresponding entries of the two sequences of numbers.
Geometrically, it is the product of the Euclidean magnitudes of the two vectors and the cosine of the angle between them.

Algebraic definition

The dot product of two vectors A = [A1, A2, …, An] and B = [B1, B2, …, Bn] is defined as:

$\mathbf{A}\cdot \mathbf{B} = \sum_{i=1}^n A_iB_i = A_1B_1 + A_2B_2 + \cdots + A_nB_n$

Geometric definition

In Euclidean space, a Euclidean vector is a geometrical object that possesses both a magnitude and a direction. A vector can be pictured as an arrow. Its magnitude is its length, and its direction is the direction that the arrow points. The magnitude of a vector A is denoted by $|\mathbf{A}|$.
The dot product of two Euclidean vectors A and B is defined by

$\mathbf{A}\cdot\mathbf{B} = |\mathbf{A}||\mathbf{B}|\cos\theta$

where θ is the angle between A and B.

Equivalence of the definitions

If e1,…,en are the standard basis vectors in Rn, then we may write

$\begin{align} \mathbf A &= [A_1,\dots,A_n] = \sum_i A_i\mathbf e_i\\ \mathbf B &= [B_1,\dots,B_n] = \sum_i B_i\mathbf e_i. \end{align}$

The vectors $e_i$ are an orthonormal basis, which means that they have unit length and are at right angles to each other. Hence since these vectors have unit length
$\mathbf e_i\cdot\mathbf e_i=1$
and since they form right angles with each other, if $i ≠ j$,
$\mathbf e_i\cdot\mathbf e_j = 0.$
Also, by the geometric definition, for any vector ei and a vector A, we note
$\mathbf A\cdot\mathbf e_i = |\mathbf A|\,|\mathbf e_i|\cos\theta = |\mathbf A|\cos\theta = A_i$,
where Ai is the component of vector A in the direction of ei.
Now applying the distributivity of the geometric version of the dot product gives

$\mathbf A\cdot\mathbf B = \mathbf A\cdot\sum_i B_i\mathbf e_i = \sum_i B_i(\mathbf A\cdot\mathbf e_i) = \sum_i B_iA_i$

which is precisely the algebraic definition of the dot product. So the (geometric) dot product equals the (algebraic) dot product.

gitNote

Posted on 2024-02-20 Edited on 2025-05-22

新建（关联）远程分支

### Git global setup
git config --global user.name "yuanwenwu3"
git config --global user.email "yuanwenwu3@jd.com"

### Create a new repository

git clone git@git.jd.com:yuanwenwu3/draw_tensorboard.git
cd draw_tensorboard
touch README.md
git add README.md
git commit -m "add README"
git push -u origin master

### Push an existing folder
cd existing_folder
git init
git remote add origin git@git.jd.com:yuanwenwu3/draw_tensorboard.git
git add .
git commit -m "Initial commit"
git push -u origin master

### Push an existing Git repository
cd existing_repo
git remote rename origin old-origin
git remote add origin git@git.jd.com:yuanwenwu3/draw_tensorboard.git
git push -u origin --all
git push -u origin --tags

### 迁移项目
# 新建一个空的repository, 例如地址为url-destination
# 在当前项目主分支上
git remote add destination ${url-destination}
git push -u origin --all