- 1274 Views
- 5 replies
- 1 kudos
I ran a query "Select * from fact_Orders". This presented a lot of garbage, The correct column headers, but the contents were extremely random, e.g. blanks in the key column, VAT rates of 12282384234E-45 . When I export to CSV , it presents fi...
- 1274 Views
- 5 replies
- 1 kudos
Latest Reply
Hi @AndLuffman, The issue you're experiencing might be related to the limitations of the Databricks interface when dealing with large datasets with many columns. The interface has a limit on the number of rows it can display at once, which can lead t...
4 More Replies
- 820 Views
- 2 replies
- 1 kudos
I realize that the best practice would be to integrate our service with Kafka as a streaming source for Databricks, but given that the service already stores data into TimescaleDB, how can I stream data from TimescaleDB into DBX? Debezium doesn't wor...
- 820 Views
- 2 replies
- 1 kudos
Latest Reply
Hi @Erik_L, Currently, there is no direct way to stream data from TimescaleDB into Databricks.
However, there are a couple of ways you can approach this:
1. **Kafka Integration**: You can integrate Kafka into your service for consuming data. Kafka i...
1 More Replies
by
kg6ka
• New Contributor
- 1183 Views
- 2 replies
- 1 kudos
Hey, guys.I have a question, so, I have databricks jobs in workflow that are linked to my databricks repo, which contains the necessary scripts for one or another job. That is, the job is linked to the databricks repo.The main code is developed in gi...
- 1183 Views
- 2 replies
- 1 kudos
Latest Reply
Hi @kg6ka, Based on the provided information, you are trying to push code from Github to Databricks repo using Databricks REST API.
However, the error message you are getting indicates that you are missing Git provider credentials.
According to the...
1 More Replies
- 334 Views
- 0 replies
- 0 kudos
I got a notebook running on DBR 12.2 with the following R code: install.packages("microbenchmark")
install.packages("furrr")
library(microbenchmark)
library(tidyverse)
# example tibble
df_test <- tibble(id = 1:100000, street_raw = rep("Bahnhofs...
- 334 Views
- 0 replies
- 0 kudos
- 1610 Views
- 4 replies
- 2 kudos
I am trying to run below for subscribing to a pubsub but this code is throwing this exception java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/DataSourceV2I have tried using all versions of https://mvnrepository.com/artifact/com.google...
- 1610 Views
- 4 replies
- 2 kudos
Latest Reply
Hi @210573 Databricks now start supporting pub/sub streaming natively now you can start using pubsub streaming for your use case for more info visit below official URL -PUB/SUB with Databricks
3 More Replies
- 975 Views
- 3 replies
- 1 kudos
In the scala notebook of databricks, I created a temporary function with a certain Jar and class name. Then I want to update the Jar. But without restart the context, I can not reload the new Jar, the temporary function always reuses the old classes....
- 975 Views
- 3 replies
- 1 kudos
Latest Reply
Hi @vonjack, I'm sorry, but based on the information provided, there isn't a direct way to refresh the classes of a temporary function in Databricks without restarting the context.
Databricks does support updating JARs, including replacing a default...
2 More Replies
- 957 Views
- 2 replies
- 0 kudos
Hi I have a structured streaming job that reads from a delta table "A" and pushes to another delta table "B".A Schema - group_key, id, timestamp, valueB Schema - group_key, watermark_timestamp, derived_valueOne requirement is that i need to get the m...
- 957 Views
- 2 replies
- 0 kudos
Latest Reply
Navigating the intricacies of structured streaming and Delta table operations on the same platform has been a stimulating yet demanding task. The community at Databricks has been instrumental in clarifying nuances. As I delve deeper, I'm reminded of ...
1 More Replies
- 387 Views
- 1 replies
- 0 kudos
I have a question regarding streaming with CDC.We currently have a delta table where CDC is not yet enabled, and its the source for other following streams that read from that table.To catch the changes for a new usecase, we need to enable CDC on tha...
- 387 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @Michael_Galli, Enabling CDC on your Delta table should not directly affect your existing downstream streams. However, there are some considerations you should keep in mind:
• If the schema of the Delta table changes after a streaming read has beg...
- 1421 Views
- 2 replies
- 0 kudos
We have SQL database. Database is designed in star schema. We are migrating data from SQL to databricks. There are cubes designed using SSAS. These cubes are used for end users in excel for analysis purpose. We are now looking for solution for:1) Can...
- 1421 Views
- 2 replies
- 0 kudos
Latest Reply
Hi @shraddharane ,
1) Can cubes be migrated?
No, SSAS cubes cannot be directly migrated to Databricks. Databricks do not support the concept of multidimensional cubes like SSAS. Databricks is a Lakehouse architecture built on the foundation of Delta ...
1 More Replies
by
140015
• New Contributor III
- 967 Views
- 3 replies
- 1 kudos
Hi,I would like to know what you think about using the Delta Live Tables when the source for this pipeline is not incremental. What I mean by that is suppose that the data provider creates for me a new folder with files each time it has update to the...
- 967 Views
- 3 replies
- 1 kudos
Latest Reply
When dealing with B2B data building, the process of updating and managing your data can present unique challenges. Since your data updates involve new folders with files and you need to process the entire new folder, the concept of incremental proces...
2 More Replies
- 3136 Views
- 12 replies
- 5 kudos
Is there api call to set "Table access control" workspace config ?
- 3136 Views
- 12 replies
- 5 kudos
Latest Reply
Hi @GNarain,
Here is an example of the API call:
Could you try and let us know?
POST /api/2.0/workspace/update{ "workspaceAccessControlEnabled": true}
This API call will enable table access control for your workspace. You can make this API call u...
11 More Replies
- 814 Views
- 1 replies
- 2 kudos
Could you please provide me an idea how I can start reprocessing of my data? Imagine I have a folder in adls gen2 "/test" with binaryFiles. They already processed with current pipeline. I want to reprocess the data + continue receive new data. What t...
- 814 Views
- 1 replies
- 2 kudos
Latest Reply
@Eldar_Dragomir In order to re-process the data, we have to change the checkpoint directory. This will start processing the files from the beginning. You can use cloudFiles.maxFilesPerTrigger, to limit the number of files getting processed per micro-...
- 720 Views
- 1 replies
- 1 kudos
I am trying to run a notebook which reads some of its variables from and external notebook (I used %run command for that purpose), but it keeps giving me error that these variables are not defined. These sequences of notebooks run perfectly fine on a...
- 720 Views
- 1 replies
- 1 kudos
Latest Reply
Atanu
Esteemed Contributor
I think the issue here is the variable is not created until a value is assigned to it. So, you may need to assign a value to get_sql_schema
- 421 Views
- 1 replies
- 0 kudos
I have to 2 questions about elapsed job runtimes. The same Scoring notebook is run 3 times as 3 Jobs. The jobs are identical, same PetaStorm code, CPU cluster config ( not Spot cluster) and data but have varying elapsed runtimes. Elapsed runtimes...
- 421 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @NathanLaw, Could you please confirm, if you have set any parameters for the best model? Is this stop after running some epochs if there is no improvement in the model performance?
- 412 Views
- 1 replies
- 1 kudos
Hi all,We are planning to develop an Autoloader based DLT Pipeline that needs to beDeployable via a CI/CD PipelineObservableCan somebody please point me to source-code that we can start with a firm foundation instead of falling into a newbie-pattern ...
- 412 Views
- 1 replies
- 1 kudos
Latest Reply
Hi @Sanjay_AMP Delta Live Tables and AutoLoader can be used together to incrementally ingest data from cloud object storage.• Python code example: - Define a table called "customers" that reads data from a CSV file in cloud object storage. - Define a...