cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Constantine
by Contributor III
  • 13160 Views
  • 2 replies
  • 6 kudos

Resolved! CREATE TEMP TABLE FROM CTE

I have written a CTE in Spark SQL WITH temp_data AS (   ......   )   CREATE VIEW AS temp_view FROM SELECT * FROM temp_view; I get a cryptic error. Is there a way to create a temp view from CTE using Spark SQL in databricks?

  • 13160 Views
  • 2 replies
  • 6 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 6 kudos

In the CTE you can't do a CREATE. It expects an expression in the form of expression_name [ ( column_name [ , ... ] ) ] [ AS ] ( query )where expression_name specifies a name for the common table expression.If you want to create a view from a CTE, y...

  • 6 kudos
1 More Replies
test_123
by New Contributor
  • 834 Views
  • 1 replies
  • 0 kudos

Autoloader not detecting changes/updated values for xml file

if i update the value in xml then autoloader not detecting the changes.same for delete/remove column or property in xml.  So request to you please help me to fix this issue

  • 834 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

It seems that the issue you're experiencing with Autoloader not detecting changes in XML files might be related to how Autoloader handles schema inference and evolution. Autoloader can automatically detect the schema of loaded XML data, allowing you...

  • 0 kudos
SyedGhouri
by New Contributor III
  • 7711 Views
  • 2 replies
  • 0 kudos

Cannot create jobs with jobs api - Azure databricks - private network

HiI'm trying to deploy the databricks jobs from dev to prod environment. I have jobs in dev environment and using azure devops, I deployed the jobs in the code format to prod environment. Now when I use the post method to create the job programmatica...

  • 7711 Views
  • 2 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@SyedGhouri You need to setup self-hosted Azure DevOps Agent inside your VNET.

  • 0 kudos
1 More Replies
pshuk
by New Contributor III
  • 2814 Views
  • 2 replies
  • 0 kudos

Copying files from dev environment to prod environment

Hi,Is there a quick and easy way to copy files between different environments? I have copied a large number of files on my dev environment (unity catalog) and want to copy them over to production environment. Instead of doing it from scratch, can I j...

  • 2814 Views
  • 2 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

If you want to copy files in Azure, ADF is usually the fastest option (for example TB of csvs, parquets). If you want to copy tables, just use CLONE. If it is files with code just use Repos and branches.

  • 0 kudos
1 More Replies
dbx-user7354
by New Contributor III
  • 3136 Views
  • 2 replies
  • 1 kudos

Pyspark Dataframes orderby only orders within partition when having multiple worker

I came across a pyspark issue when sorting the dataframe by a column. It seems like pyspark only orders the data within partitions when having multiple worker, even though it shouldn't.  from pyspark.sql import functions as F import matplotlib.pyplot...

dbxuser7354_0-1711014288660.png dbxuser7354_1-1711014300462.png
  • 3136 Views
  • 2 replies
  • 1 kudos
Latest Reply
MarkusFra
New Contributor III
  • 1 kudos

@Retired_mod Sorry if I have to ask again, but I am a bit confused by this.I thought, that pysparks `orderBy()` and `sort()` do a shuffle operation before the sorting for exact this reason. There is another command `sortWithinPartitions()` that does ...

  • 1 kudos
1 More Replies
aseufert
by New Contributor III
  • 10000 Views
  • 2 replies
  • 3 kudos

Git Stash

Looked through some previous posts and documentation and couldn't find anything related to use of Git stash in Databricks Repos. Perhaps I missed it. I also don't see an option in the UI.Does anyone know if there's a way to stash changes either in th...

  • 10000 Views
  • 2 replies
  • 3 kudos
Latest Reply
javierbg
New Contributor III
  • 3 kudos

This is actually a big hurdle when trying to switch between working in two different branches, it would be a welcome addition to the Databricks IDE.

  • 3 kudos
1 More Replies
test_123
by New Contributor
  • 6294 Views
  • 0 replies
  • 0 kudos

Schema evolution is not working for XML file

I have used .option("cloudFiles.schemaEvolutionMode", "addNewColumns")\ for newly added property in xml file but autoloader not detected the changes. As per .option("cloudFiles.schemaEvolutionMode", "addNewColumns")\ behavior it has failed at first t...

  • 6294 Views
  • 0 replies
  • 0 kudos
JohanS
by New Contributor III
  • 1692 Views
  • 1 replies
  • 0 kudos

Resolved! Container Service Docker images fail when a pip package is installed

I'm building my own Docker images to use for a cluster. The problem is that the only image I seem to be able to run is the official base image "databricksruntime/python:13.3-LTS". If I install a pip package, I get the following on standard error:/dat...

Data Engineering
container service
Docker
pip
python
  • 1692 Views
  • 1 replies
  • 0 kudos
Latest Reply
JohanS
New Contributor III
  • 0 kudos

I found the culprit: --ignore-installed upgraded matplotlib too much, and broke it.

  • 0 kudos
Arun2151
by New Contributor II
  • 1608 Views
  • 1 replies
  • 2 kudos

spark.sql query is executing from the except block even though the try block is succeeded

I have developed a azure databricks notebook where data will be copied from landing zone to STG delta table, used Try and except blocks in the code to catch the errors, if their is an error the except block will catch the error message. In the except...

  • 1608 Views
  • 1 replies
  • 2 kudos
Latest Reply
Arun2151
New Contributor II
  • 2 kudos

below is my code

  • 2 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 1100 Views
  • 1 replies
  • 1 kudos

R2 as external location

R2 (egress-free) can now be quickly registered as an external location. You can use it not only for Delta Sharing! #databricks

r2.png
  • 1100 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Thank you for sharing this @Hubert-Dudek!!!

  • 1 kudos
dmart
by New Contributor III
  • 5365 Views
  • 12 replies
  • 0 kudos

can't delete 50TB of overpartitioned data from dbfs

I need to delete 50TB of data out of dfbs storage. It is overpartitioned and dbutils does not work. Also, limiting partition size and iterating over data to delete doesn't work. Azure locks access from storage from the resource group permissions and ...

  • 5365 Views
  • 12 replies
  • 0 kudos
Latest Reply
dmart
New Contributor III
  • 0 kudos

For anyone else with this issue, there is no solution other than deleting the whole databricks workspace which then deletes all the resources locked up in the managed resource group. The data could not be deleted in any other way, not even by Microso...

  • 0 kudos
11 More Replies
demost11
by New Contributor II
  • 1173 Views
  • 0 replies
  • 0 kudos

Databricks Connect Passthrough

I'm using the Databricks Connect VS Code plugin. It's cool how it figures out what things need to be run on the cluster vs. run locally. However, is it possible to force it to run specific Python statements remotely instead of locally?For context, th...

  • 1173 Views
  • 0 replies
  • 0 kudos
IshaBudhiraja
by New Contributor II
  • 1242 Views
  • 0 replies
  • 0 kudos

Installation of external libraries(wheel file) in Data bricks through synapse using new job cluster

Aim-Installation of external libraries(wheel file) in Data bricks through synapse using new job clusterSolution- I have followed the below steps:I have created a pipeline in synapse that consists of a notebook activity that is using a new job cluster...

  • 1242 Views
  • 0 replies
  • 0 kudos
Dikshant
by New Contributor
  • 1747 Views
  • 0 replies
  • 0 kudos

SchemaEvolutionMode exception in Databricks 14.2

I am unable to display the below stream after reading it.df= spark.readStream.format("cloudFiles")\.option("cloudFiles.format", "csv")\.option("header", "true")\.option("delimiter", "\t")\.option("inferSchema", "true")\.option("cloudFiles.connectionS...

Data Engineering
schemaEvolutionMode
  • 1747 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels