Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hi Team, by default where does Azure databricks store ganglia metrics snapshots and other logs of the cluster? and if we want to manually do a cleanup what are the steps to do the same?
I have a SQL query which I am converting into spark sql in azure databricks running in my jupyter notebook. In my SQL query, a column named Type is created on the fly which has value 'Goal' for every row:SELECT Type='Goal', Value FROM tableNow, when...
You can use the extended table description. For example, the following python code will print the current definition of the view: table_name = ""
df = spark.sql("describe table extended {}".format(table_name))
df.createOrReplaceTempView("view_desript...
I am a novice with databricks. I am performing some independent learning. I am trying to add a column to an existing table. Here is my syntax:
%sql
ALTER TABLE car_parts ADD COLUMNS (engine_present boolean)
which returns the error:SyntaxError: inva...
Is the table you are working with in the Delta format? The table commands (i.e. Alter) do not work for all storage formats. For example if I run the following commands then I can alter a table. Note - there is no data in the table but the table exist...
Hi,
I am looking at my Databricks workspace and it looks like I am missing DBFS Databricks-dataset root folder. The dbfs root folders I can view are FileStore, local_disk(),mnt, pipelines and user.
Can I mount Databricks-dataset or am I missing some...
Hello everyone! First post on the forums, been stuck at this for awhile now and cannot seem to understand why this is happening. Basically, I have been using a seems to be premade Databricks notebook from Databricks themselves for a DNS Analytics exa...
@NickGoodfella​ , What's the notebook you're looking at, this one? https://databricks.com/notebooks/dns-analytics.html Are you sure all the previous cells executed? this is suggesting there isn't a model at the path that's expected. You can take a lo...
I have a customer with a streaming pipeline from Kafka to Delta. They are leveraging RocksDB, watermarking for 30min and attempting to dropDuplicates. They are seeing their state grow to 6.2 billion rows--- on a stream that hits at maximum 7000 rows ...
I've seen a similar issue with large state using flatMapGroupsWithState. It is possible that A.) they are not using the state.setTimeout correctly or B.) they are not calling state.remove() when the stored state has timed out, leaving the state to gr...
how to calculate median on delta tables in azure databricks using sql ?
select col1, col2, col3, median(col5) from delta table group by col1, col2, col3
We have created a table using the new generated column feature (https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batch#deltausegeneratedcolumns)
CREATE TABLE ingest.MyEvent(
data binary,
topic string,
timestamp timestamp,
date dat...
I think you have to pass a date in your select query instead of a timestamp.The generated column will indeed derive a data from the timestamp and partition by it.
But the docs state:
When you write to a table with generated columns and you do not ex...
when trying to update or display the dataframe, one of the parquet files is having some issue,
"Parquet column cannot be converted. Expected: DecimalType(38,18), Found: DOUBLE"
What could be the issue?
There is a mount path /mnt/folder
I am passing filename as a variable from another function and completing the path variable as follows:
filename=file.txt
path=/mnt/folder/subfolder/+filename
When I'm trying to use the path variable is a function, f...
assumptions: There are microservices behind an api-gateway, they communicate through HTTP synchronously. obviously, each one of those microservices is a web server. now I want my microservice to play as a Kafka producer and "consumer" too. more clea...
Hello!
I'm using databricks-connector to launch spark jobs using python.
I've validated that the python version (3.8.10) and runtime version (8.1) are supported by the installed databricks-connect (8.1.10).
Everytime a mapPartitions/foreachParti...
A community forum to discuss working with Databricks Cloud and Spark. ... Double job execution caused by databricks' RemoteServiceExec using databrick.MyBalanceNow