cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

User16856693631
by New Contributor II
  • 1692 Views
  • 1 replies
  • 0 kudos

Can you create Clusters via a REST API?

Yes, you can. See here: https://docs.databricks.com/dev-tools/api/latest/clusters.htmlThe JSON payload would look as follows:{ "cluster_name": "my-cluster", "spark_version": "7.3.x-scala2.12", "node_type_id": "i3.xlarge", "spark_conf": { ...

  • 1692 Views
  • 1 replies
  • 0 kudos
Latest Reply
ManishPatil
New Contributor II
  • 0 kudos

One can create a Cluster(s) using CLuster API @ https://docs.databricks.com/dev-tools/api/latest/clusters.html#create However, REST API 2.0 doesn't provide certain features like "Enable Table Access Control", which has been introduced after REST API ...

  • 0 kudos
BorislavBlagoev
by Valued Contributor III
  • 3542 Views
  • 2 replies
  • 1 kudos

DBUtils cannot find widgets [Windows 10]

I use databricks connect to connect PyCharm with databricks cluster remotely but when I try to get dbutils.widget throw an error. cluster conf: spark.databricks.service.server.enabled true spark.databricks.hive.metastore.glueCatalog.enabled true ...

  • 3542 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

This is normal behavior. databricks-connect does not support the whole dbutils class.https://docs.databricks.com/dev-tools/databricks-connect.html#access-dbutilsWidgets are not on the list.

  • 1 kudos
1 More Replies
bciampa
by New Contributor II
  • 20217 Views
  • 1 replies
  • 1 kudos

Unable to infer schema for Parquet at

I have this code in a notebook:val streamingDataFrame = incomingStream.selectExpr("cast (body as string) AS Content") .withColumn("Sentiment", toSentiment($"Content")) import org.apache.spark.sql.streaming.Trigger.ProcessingTime val result = stre...

  • 20217 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

seems like an invalid parquet file. my guess is the incoming data has mixed types (for the same column) or a different/invalid structure.

  • 1 kudos
cconnell
by Contributor II
  • 560 Views
  • 0 replies
  • 1 kudos

medium.com

I wrote a review of Koalas by porting an existing pandas program. Comments welcome.https://medium.com/@chuck.connell.3/pandas-on-databricks-via-koalas-a-review-9876b0a92541

  • 560 Views
  • 0 replies
  • 1 kudos
mokshaessential
by New Contributor
  • 414 Views
  • 0 replies
  • 0 kudos

mokshaessentials.com

Mokshaessentials is one of the best essential oils providers that provides 100 % pure & natural essential oils.Also, buy essential oils only @Mokshaessentials.Web:- https://mokshaessentials.com/#naturalessentialoils,#buyessentialoils, #bestessentialo...

  • 414 Views
  • 0 replies
  • 0 kudos
haseebkhan1421
by New Contributor
  • 2468 Views
  • 1 replies
  • 3 kudos

How can I create a column on the fly which would have same value for all rows in spark sql query

I have a SQL query which I am converting into spark sql in azure databricks running in my jupyter notebook. In my SQL query, a column named Type is created on the fly which has value 'Goal' for every row:SELECT Type='Goal', Value FROM tableNow, when...

  • 2468 Views
  • 1 replies
  • 3 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 3 kudos

The correct syntax would be: SELECT 'Goal' AS Type, Value FROM table

  • 3 kudos
maheshwor
by New Contributor III
  • 1214 Views
  • 1 replies
  • 2 kudos

Resolved! Databricks Views

How do we find the definition of View in databricks?

  • 1214 Views
  • 1 replies
  • 2 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 2 kudos

You can use the extended table description. For example, the following python code will print the current definition of the view: table_name = "" df = spark.sql("describe table extended {}".format(table_name)) df.createOrReplaceTempView("view_desript...

  • 2 kudos
TimothyClotwort
by New Contributor
  • 4014 Views
  • 1 replies
  • 0 kudos

SQL Alter table command not working for me

I am a novice with databricks. I am performing some independent learning. I am trying to add a column to an existing table. Here is my syntax: %sql ALTER TABLE car_parts ADD COLUMNS (engine_present boolean) which returns the error:SyntaxError: inva...

  • 4014 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

Is the table you are working with in the Delta format? The table commands (i.e. Alter) do not work for all storage formats. For example if I run the following commands then I can alter a table. Note - there is no data in the table but the table exist...

  • 0 kudos
rami1
by New Contributor II
  • 1682 Views
  • 1 replies
  • 0 kudos

Missing Databricks Datasets

Hi, I am looking at my Databricks workspace and it looks like I am missing DBFS Databricks-dataset root folder. The dbfs root folders I can view are FileStore, local_disk(),mnt, pipelines and user. Can I mount Databricks-dataset or am I missing some...

  • 1682 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

If you run the following command do you receive an error? Or do you just get an empty list?dbutils.fs.ls("/databricks-datasets")

  • 0 kudos
NickGoodfella
by New Contributor
  • 1704 Views
  • 1 replies
  • 1 kudos

DNS_Analytics Notebook Problems

Hello everyone! First post on the forums, been stuck at this for awhile now and cannot seem to understand why this is happening. Basically, I have been using a seems to be premade Databricks notebook from Databricks themselves for a DNS Analytics exa...

  • 1704 Views
  • 1 replies
  • 1 kudos
Latest Reply
sean_owen
Databricks Employee
  • 1 kudos

@NickGoodfella​ , What's the notebook you're looking at, this one? https://databricks.com/notebooks/dns-analytics.html Are you sure all the previous cells executed? this is suggesting there isn't a model at the path that's expected. You can take a lo...

  • 1 kudos
User16826994223
by Honored Contributor III
  • 1130 Views
  • 1 replies
  • 0 kudos

The State in-stream is growing too large in stream

I have a customer with a streaming pipeline from Kafka to Delta. They are leveraging RocksDB, watermarking for 30min and attempting to dropDuplicates. They are seeing their state grow to 6.2 billion rows--- on a stream that hits at maximum 7000 rows ...

  • 1130 Views
  • 1 replies
  • 0 kudos
Latest Reply
shaines
New Contributor II
  • 0 kudos

I've seen a similar issue with large state using flatMapGroupsWithState. It is possible that A.) they are not using the state.setTimeout correctly or B.) they are not calling state.remove() when the stored state has timed out, leaving the state to gr...

  • 0 kudos
PadamTripathi
by New Contributor II
  • 5381 Views
  • 2 replies
  • 1 kudos

how to calculate median on azure databricks delta table using sql

how to calculate median on delta tables in azure databricks using sql ? select col1, col2, col3, median(col5) from delta table group by col1, col2, col3

  • 5381 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

try with the percentile function, as median = percentile 50: https://spark.apache.org/docs/latest/api/sql/#percentile

  • 1 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels