cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

aladda
by Databricks Employee
  • 4562 Views
  • 1 replies
  • 0 kudos

Resolved! I read that Delta supports concurrent writes to separate partitions of the table but I'm getting an error when doing so

I’m running 3 separate dbt processes in parallel. all of them are reading data from different databrick databases, creating different staging tables by using dbt alias, but they all at the end update/insert to the same target table. the 3 processes r...

  • 4562 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

You’re likely running into the issue described here and a solution to it as well. While Delta does support concurrent writers to separate partitions of a table, depending on your query structure join/filter/where in particular, there may still be a n...

  • 0 kudos
aladda
by Databricks Employee
  • 14205 Views
  • 1 replies
  • 1 kudos
  • 14205 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Databricks Employee
  • 1 kudos

The Databricks Add-on for Splunk built as part of Databricks Labs can be leveraged for Splunk integrationIt’s a bi-directional framework that allows for in-place querying of data in databricks from within Splunk by running queries, notebooks or jobs ...

  • 1 kudos
Anonymous
by Not applicable
  • 8035 Views
  • 1 replies
  • 1 kudos

Resolved! Jobs - Delta Live tables difference

Can you please explain the difference between Jobs and Delta Live tables?

  • 8035 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Databricks Employee
  • 1 kudos

Jobs are designed for automated execution (scheduled or manually) of Databricks Notebooks, JARs, spark-submit jobs etc. Its essentially a generic framework to run any kind of Data Engg, Data Analysis or Data Science workload. Delta Live Tables on the...

  • 1 kudos
aladda
by Databricks Employee
  • 6079 Views
  • 1 replies
  • 0 kudos
  • 6079 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Notebooks in Databricks are part of the WebApp which is run & managed by databricks from the Control Plane. See the high level architecture here for details - https://docs.databricks.com/getting-started/overview.html

  • 0 kudos
tj-cycyota
by Databricks Employee
  • 3192 Views
  • 1 replies
  • 0 kudos

Resolved! How can I make a cluster start up in the availability-zone (AZ) with the most available IPs?

I see the default in the UI is to always create clusters in a single AZ (e.g. us-west-2a), but want to distribute workloads across all available AZs.

  • 3192 Views
  • 1 replies
  • 0 kudos
Latest Reply
tj-cycyota
Databricks Employee
  • 0 kudos

Found the answer - not available in the UI, but via API, you can submit the cluster definition with "aws_attributes": { "zone_id": "auto" },This is documented in the Cluster API: https://docs.databricks.com/dev-tools/api/latest/clusters.html#aw...

  • 0 kudos
User16137833804
by Databricks Employee
  • 10296 Views
  • 2 replies
  • 0 kudos
  • 10296 Views
  • 2 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

There is a native Cost Management Connector in Power BI that allows one to make powerful, customized visualization and cost/usage reports. I also recommend reviewing the Chargeback/Cost Analysis section of the ADB Best Practices guide here - https://...

  • 0 kudos
1 More Replies
Ryan_Chynoweth
by Databricks Employee
  • 3392 Views
  • 1 replies
  • 1 kudos
  • 3392 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 1 kudos

Yes, Azure Data Factory can execute code on Azure Databricks. The best way to return values from the notebook to Data factory is to use the dbutils.notebook.exit() function at the end of your notebook or whenever you want to terminate execution.

  • 1 kudos
Anonymous
by Not applicable
  • 14271 Views
  • 1 replies
  • 0 kudos
  • 14271 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

The key benefits of High Concurrency clusters are that they provide fine-grained sharing for maximum resource utilization and minimum query latencies.Note that a Standard cluster is recommended for a single user. Standard clusters can run workloads d...

  • 0 kudos
University_RobR
by Databricks Employee
  • 1631 Views
  • 1 replies
  • 0 kudos

What Databricks resources are available for university faculty members?

I would like to use Databricks to teach large-scale analytics in my classroom; does Databricks have any resources or community assets that can help me out?

  • 1631 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

For folks that are looking to leverage Databricks as a teaching asset, please sign contact us for Databricks University Alliance. https://databricks.com/p/teach

  • 0 kudos
University_RobR
by Databricks Employee
  • 5118 Views
  • 1 replies
  • 0 kudos

What Databricks resources are available for university students?

I want to learn how to use Databricks for my courses at university, and maybe to get a Databricks Certification. Can you help me out?

  • 5118 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

We have a ton of great resources available for people who are wanting to learn Databricks, specifically for university students. Checkout our our university page, to learn more about Databricks Community Edition, Free workshops, and self-paced course...

  • 0 kudos
User16826994223
by Databricks Employee
  • 1595 Views
  • 1 replies
  • 0 kudos

Efficient data retrieval process between Azure Blob storage and Azure databricks

I am trying to design a stream a data analytics project using functions --> event hub --> storage --> Azure factory --> databricks --> SQL server.What I am strugging with at the moment is the idea about how to optimize "data retrieval" to feed m...

  • 1595 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

Check out our auto loader capabilities that can automatically track and process files that need to be processed. AutoloaderThere are two options: directory listing, which is essentially completing the same steps that you have listed above but in a sl...

  • 0 kudos
User16826992666
by Databricks Employee
  • 3471 Views
  • 1 replies
  • 0 kudos

Resolved! Can you implement fine grained access controls on Delta tables?

I would like to provide row and column level security on my tables I have created in my workspace. Is there any way to do this?

  • 3471 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

Databricks includes two user functions that allow you to express column- and row-level permissions dynamically in the body of a view definition.current_user(): return the current user name.is_member(): determine if the current user is a member of a s...

  • 0 kudos
User16826987838
by Databricks Employee
  • 1630 Views
  • 1 replies
  • 1 kudos
  • 1630 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mooune_DBU
Databricks Employee
  • 1 kudos

With Koalas, which is a Pandas'API on top of Spark Dataframes, there should be minimal code changes required.Please refer to this blog for more info

  • 1 kudos
User16765131552
by Databricks Employee
  • 8731 Views
  • 1 replies
  • 0 kudos

Read excel files and append to make one data frame in Databricks from azure data lake without specific file names

I am storing excel files in Azure data lake (gen 1). They follow filenames follow the same pattern "2021-06-18T09_00_07ONR_Usage_Dataset", "2021-06-18T09_00_07DSS_Usage_Dataset", etc. depending on the date and time. I want to read all the files in th...

  • 8731 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

If you are attempting to read all the files in a directory you should be able to use a wild card and filter using the extension. For example: df = (spark .read .format("com.crealytics.spark.excel") .option("header", "True") .option("inferSchema", "tr...

  • 0 kudos
HowardWong
by Databricks Employee
  • 2119 Views
  • 1 replies
  • 0 kudos

How many users can the JDBC endpoint support in the All Purpose HC?

What is the max number of users can the JDBC endpoint support in the All Purpose high concurrency cluster? To support more sql workloads, is it better to go with Databricks Sql Enpoints?

  • 2119 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

There is a limit an execution context limit of 145. This means you can have at most 145 notebooks attached to a cluster. https://kb.databricks.com/execution/maximum-execution-context.htmlIf you are primarily using SQL then Databricks SQL Endpoints wo...

  • 0 kudos
Labels