Let's Build an AI-Driven Search on Our Website

with embeddings and Redis

TYPO3 Developer Days 2025

Frank Berger

A bit about me

  • Frank Berger
  • Head of Engineering at sudhaus7.de, a label of the B-Factor GmbH, member of the code711.de network
  • Started as an Unix Systemadministrator who also develops in 1996
  • Working with PHP since V3
  • Does TYPO3 since 2005

What is an Embedding?

A point in multi-dimensional space, described by a vector

Mathematically represent characteristics and meaning of a word, phrase or text

It encodes both semantic and contextual information

AI and LLMs enable the creation of embeddings with hundreds or even thousands of dimensions

Used in most cases: Cosine Distance

Cosine distance is defined as the distance of the angle between two vectors normalized to unit length, ranging from 0 to 2

  • 0 = Synonymous (the same)
  • 1 = Orthogonal (no relation)
  • 2 = Antonymous (the opposite)

the shortest distance wins

In PHP we can calculate it like this:


	function distance($a,$b):float
	{
		return 1 - (dotp($a,$b) /
			sqrt(dotp($a,$a) * dotp($b,$b))
		);
	}
	// calculating the dot-product
	function dotp($a,$b):float
	{
		$products = array_map(function($da, $db) {
			return $da * $db;
		}, $a, $b);
		return (float)array_sum($products);
	}
					

This will produce a floating point number between 0 and 2

What does such an embedding look like?

the word/concept "Queen"

[-0.0045574773,-0.0067263762,-0.002498418,
	-0.018243857,-0.01689091,0.010516719,-0.0076504247,
	-0.024046184,-0.017365139,-0.012818122,0.0145058185,
	0.022330591,0.014533714,-0.0029691597,-0.018801773,
	0.008884814,0.043322187,0.021061333,0.029513761,
	-0.008801127,0.0020712635,0.014136199,-0.005460604,
	0.003598559,-0.005296716,-0.010230786,0.0072319875,
	... ,-0.011262931]

encoded in 768 dimensions, capturing meaning, context and relations

How do I get an embedding in PHP?

composer req openai-php/client

ollama run nomic-embed-text:latest


$client = OpenAI::factory()
	->withBaseUri('http://localhost:11434/v1')
	->make();
$result = $client->embeddings()->create([
	'model'=>'nomic-embed-text',
	'dimensions'=>768,
	'input'=>$text,
	'encoding_format'=> 'float',
]);
return $result->toArray()['data'][0]['embedding'];
					

Other good embeding engine: snowflake-arctic-embed2

Monarch + Woman = Queen

This is how LLMs calculate what you 'mean'

Normalize unformatted data


$embedding  = new Embedding();
$normalizeMe = $argv[1];

$dictionary  = $embedding->generateDictionary([
	'rock',	'paper', 'scissors', 'lizard', 'spock',
]);
$distances = $embedding->calculateDistances(
	$embedding->calculateEmbedding( $normalizeMe ),
	$dictionary
); // already sorted

print_r($distances);
printf('the Input "%s" is normalized to "%s"'."\n\n\n",
$normalizeMe,array_keys($distances)[0]);
					

This way we can normalize random data and text to our domain.

Embeddings in Redis

docker run --name my-Redis-container \
-p 6378:6379 -v `pwd`/dockerRedisdata:/data \
-d Redis/Redis-stack-server:latest
  • Defining our JSON structure
  • Creating an Index in Redis
  • Storing Embeddings in the Index
  • Searching Data using the Index

"Index" is here a synonym for a table and an index in an SQL database

Defining our JSON structure


{
	uid: 123,
	pid: 111,
	table: 'tt_content',
	text: 'lorem ipsum vitae dolor sit amit...',
	slug: 'https://my.domain.de/the/path/to/the/page',
	embedding: [0.45633,-0.567476,0.126775,...]
}
					

Creating an Index in Redis


FT.CREATE idx:MYINDEX ON JSON PREFIX 1 text: SCORE 1.0
SCHEMA
	$.uid NUMERIC
	$.pid NUMERIC
	$.table TEXT WEIGHT 1.0 NOSTEM
	$.text TEXT WEIGHT 1.0 NOSTEM
	$.slug TEXT WEIGHT 1.0 NOSTEM
	$.embedding AS vector VECTOR FLAT 6 TYPE FLOAT32 DIM 1536 DISTANCE_METRIC COSINE
 					

Creating an Index in Redis

in PHP Using pRedis/pRedis


$client = new Predis\Client([ 'port'=>6378 ]);
$fields = [
	new NumericField('$.uid', "uid"),
	new NumericField('$.pid', "pid"),
	new TextField('$.table', "table"),
	new TextField('$.text', "text"),
	new TextField('$.slug', "slug"),
	new VectorField( '$.embedding', 'FLAT' ,[
		'TYPE','FLOAT32',
		'DIM','1536',
		'DISTANCE_METRIC','COSINE'
	],'vector')
];
$arguments = new CreateArguments();
$arguments->on('JSON' )->prefix( ['MYINDEX:'])->score(1.0);
$status = $client->ftcreate(
	'idx:MYINDEX', $fields, $arguments
);
var_dump($status);

Storing Embeddings in the Index


// $row is from the TYPO3 database,
// or the TYPO3 Datahandler or similar
// $table for example tt_content
$text = sprintf("%s\n%s",
	$row['header'],
	strip_tags( $row['bodytext'] )
);
$key = 'MYINDEX:'.$table.':'.$row['uid'];
$set = [
	'uid'=>$row['uid'],
	'pid'=>$row['pid'],
	'slug'=>$base.$row['slug'],
	'table'=>$table,
	'text' => $text,
	'embedding'=>getEmbedding($text)
];
$client = new Predis\Client(['port'=>6378]);
$client->jsonset( $key, '$', json_encode($set) );
					

Searching Data using the Index


use Predis\Command\Argument\Search\SearchArguments;
$searchEmbedding = getEmbedding( $searchTerm );
$client = new Predis\Client([ 'port' => 6378 ]);
$searchArguments = new SearchArguments();
$searchArguments
    ->dialect( 2 ) // must be 2 for vector
    ->sortBy('__vector_score') // closest neighbour by definition
    ->addReturn( 6, '__vector_score', 'uid','pid'
		,'text','table','slug')
    ->limit(0, 10)
    ->params( [
			'query_vector',
			pack( 'f1536', ... $searchEmbedding )
		]);

$rawResult = $client->ftsearch(
	'idx:MYINDEX',
	'(*)=>[KNN 100 @vector $query_vector]',
	$searchArguments
);
[$count,$results] = normalizeRedisResult( $rawResult );
					

Searching Data using the Index

normalizing the Redis-result

The RAW Result:


Array (
[0] => 3 // amount of result sets
[1] => MYINDEX:tt_content:2467 // KEY first set
[2] => Array (
	[0] => __vector_score
	[1] => 0.18477755785
	[2] => uid
	[3] => 2467
	[4] => text
	[5] => Contact
	[6] => table
	[7] => tt_content
	[8] => pid
	[9] => 265
	[10] => slug
	[11] => https://www.tcworld.info/contact
)
[3,4] // next key and set
[5,6] // next key and set
)
					

Searching Data using the Index

normalizing the Redis-result


function normalizeRedisResult(array $result):array {
	$count = array_shift($result);
	$return = [];
	foreach($result as $value) {
		if(!is_array($value)) $key=$value;
		else {
			$set = [];
			for($i=0,$l=count($value); $i<$l; $i=$i+2) {
				$k = $value[$i];
				$set[$k] = $value[$i+1];
			}
			$return[ $key ] = $set;
		}
	}
	return [$count,$return];
}
					

Searching Data using the Index

normalizing the Redis-result


Array
(
    [MYINDEX:tt_content:2509] => Array
        (
            [__vector_score] => 0.194359481335
            [uid] => 2509
            [text] => lorem ipsum vitae...
            [table] => tt_content
            [pid] => 273
            [slug] => https://www.tcworld.info/faq
        )
    [MYINDEX:tt_content:2528] => Array ()
    [MYINDEX:tt_content:2567] => Array ()
)
					

want to see this stuff in action?

CSS: https://picocss.com/

Javascript: VanillaJS

Embeddings: OpenAI text-embedding-3-small

Usecases

  • Product Search by similarity
  • Better Searches for API provided data (Redmine for example)
  • Faster quicksearches
  • Categorization/Normalisation of unstructured content

Caveats

  • You have to stick to a model and dimension
  • You might need to re-index sometimes

Alternatives to Redis

  • Qdrant
  • Pinecone (TYPO3 extension available)
  • Weaviate
  • PostgresSQL Vector
  • ... many more

Next steps

  • More Abstraction
  • DataHandler Support
  • Search Plugin (?)

Shameless plug

smartbrew.ai

What are your questions?

Thank you, I am here all weekend

Twitter: @FoppelFB | Mastodon: @foppel@phpc.social

fberger@sudhaus7.de | https://sudhaus7.de/